An Introduction to Real Analysis 


John K. Hunter’ 


DEPARTMENT OF MATHEMATICS, UNIVERSITY OF CALIFORNIA AT DAVIS 


1The author was supported in part by the NSF. Thanks to Janko Gravner for a number of correc- 
tions and comments. 


ABSTRACT. These are some notes on introductory real analysis. They cover 
the properties of the real numbers, sequences and series of real numbers, limits 
of functions, continuity, differentiability, sequences and series of functions, and 
Riemann integration. They don’t include multi-variable calculus or contain 
any problem sets. Optional sections are starred. 


© John K. Hunter, 2014 


Contents 


Chapter 1. Sets and Functions 


1.1. Sets 
1.2. Functions 


.3. Composition and inverses of functions 
.4. Indexed sets 
„5. Relations 


.6. Countable and uncountable sets 


Chapter 2. Numbers 


.l. Integers 


.2. Rational number 


iA 


.3. Real numbers: algebraic properties 
.4. Real numbers: ordering properties 
.5. The supremum and infimum 
.6. Real numbers: completeness 


«T. Properties of the supremum and infimum 


Chapter 3. Sequences 


1. The absolute value 

.2. Sequences 

.3. Convergence and limits 
.4. Properties of limits 


.5. Monotone sequences 


EH 


.6. The lim sup and limin: 
.T. Cauchy sequences 


.8.  Subsequences 


m. 


V 


Contents 


.9. The Bolzano-Weierstrass theorem 


hapter 4. Series 


.1. Convergence of series 
.2. The Cauchy condition 


Q 


AJ e 


.3. Absolutely convergent series 
.4. The comparison test 
5. * The Riemann ¢-function 


6. The ratio and root tests 


SS 


7. Alternating series 

.8. Rearrangements 

9. The Cauchy product 
.10.__* Double series 


ll. * The irrationality of e 


Chapter 5. ‘Topology of the Real Numbers 


A] 
i 


ict 


.l. Open sets 

.2. Closed set 

.3. Compact sets 
4. Connected set 
5. * The Cantor se 


Chapter 6. Limits of Functions 


1. Limits 
.2. Left, right, and infinite limits 
.3. Properties of limits 


hapter 7. Continuous Functions 


= [A 


<< 


Q 


.1. Continuit 
.2. Properties of continuous functions 

.3. Uniform continuity 

.4. Continuous functions and open sets 
.5. Continuous functions on compact sets 
.6. ‘The intermediate value theorem 


T. Monotonic functions 


Chapter 8. Differentiable Functions 


O0 


.l. ‘The derivative 
.2. Properties of the derivative 
3. The chain rule 


A. Extreme value 


i a] [AI AI IN 
(op 


.5. The mean value theorem 


57 


59 
59 
62 
64 
66 
68 
69 
71 
73 
TT 
78 
86 


89 
89 
92 
95 
102 
104 


109 
109 
114 
117 


121 
121 
125 
127 
129 
131 
133 
136 


139 
139 
145 
147 
150 
152 


Contents 


v 
154 
157 
162 

167 
107 
169 
170 
171 
9.5. Series 175 

181 
181 
182 
10.3. Examples of power series 184 
188 
193 
195 
10.7. * Smooth versus analytic functions 197 

205 
206 

11.2. Definition of the integral 208 
215 
219 
222 
230 
234 
238 

Chapter 12. Properties and Applications of the Integral 241 
241 
246 
251 
12.4. Improper Riemann integrals 255 
261 
265 
268 

271 
271 
13.2. Normed spaces 276 


vi Contents 


279 
282 
287 
289 
293 


Bibliography 299 


| 
Chapter 1 


Sets and Functions 


We understand a “set” to be any collection M of certain distinct objects 
of our thought or intuition (called the “elements” of M) into a whole. 
(Georg Cantor, 1895) 


In mathematics you don’t understand things. You just get used to them. 
(Attributed to John von Neumann) 


In this chapter, we define sets, functions, and relations and discuss some of 
their general properties. This material can be referred back to as needed in the 
subsequent chapters. 


1.1. Sets 


A set is a collection of objects, called the elements or members of the set. The 
objects could be anything (planets, squirrels, characters in Shakespeare’s plays, or 
other sets) but for us they will be mathematical objects such as numbers, or sets 
of numbers. We write x € X if x is an element of the set X and x ¢ X if x is not 
an element of X. 


If the definition of a “set” as a “collection” seems circular, that’s because it 
is. Conceiving of many objects as a single whole is a basic intuition that cannot 
be analyzed further, and the the notions of “set” and “membership” are primitive 
ones. These notions can be made mathematically precise by introducing a system 
of axioms for sets and membership that agrees with our intuition and proving other 
set-theoretic properties from the axioms. 


The most commonly used axioms for sets are the ZFC axioms, named somewhat 
inconsistently after two of their founders (Zermelo and Fraenkel) and one of their 
axioms (the Axiom of Choice). We won’t state these axioms here; instead, we use 
“naive” set theory, based on the intuitive properties of sets. Nevertheless, all the 
set-theory arguments we use can be rigorously formalized within the ZFC system. 


1 


2 1. Sets and Functions 


Sets are determined entirely by their elements. Thus, the sets X, Y are equal, 
written X = Y, if 
xeX ifandonlyif rey. 


It is convenient to define the empty set, denoted by Ø, as the set with no elements. 
(Since sets are determined by their elements, there is only one set with no elements!) 
If X # Ø, meaning that X has at least one element, then we say that X is non- 
empty. 
We can define a finite set by listing its elements (between curly brackets). For 
example, 
X = {2,3,5,7,11} 


is a set with five elements. The order in which the elements are listed or repetitions 
of the same element are irrelevant. Alternatively, we can define X as the set whose 
elements are the first five prime numbers. It doesn’t matter how we specify the 
elements of X, only that they are the same. 


Infinite sets can’t be defined by explicitly listing all of their elements. Never- 
theless, we will adopt a realist (or “platonist”) approach towards arbitrary infinite 
sets and regard them as well-defined totalities. In constructive mathematics and 
computer science, one may be interested only in sets that can be defined by a rule or 
algorithm — for example, the set of all prime numbers — rather than by infinitely 
many arbitrary specifications, and there are some mathematicians who consider 
infinite sets to be meaningless without some way of constructing them. Similar 
issues arise with the notion of arbitrary subsets, functions, and relations. 


1.1.1. Numbers. The infinite sets we use are derived from the natural and real 
numbers, about which we have a direct intuitive understanding. 


Our understanding of the natural numbers 1,2,3,... derives from counting. 
We denote the set of natural numbers by 
N= 41,253) 0k. 


We define N so that it starts at 1. In set theory and logic, the natural numbers 
are defined to start at zero, but we denote this set by No = {0,1,2,...}. Histori- 
cally, the number 0 was later addition to the number system, primarily by Indian 
mathematicians in the 5th century AD. The ancient Greek mathematicians, such 
as Euclid, defined a number as a multiplicity and didn’t consider 1 to be a number 
either. 


Our understanding of the real numbers derives from durations of time and 
lengths in space. We think of the real line, or continuum, as being composed of an 
(uncountably) infinite number of points, each of which corresponds to a real number, 
and denote the set of real numbers by R. There are philosophical questions, going 
back at least to Zeno’s paradoxes, about whether the continuum can be represented 
as a set of points, and a number of mathematicians have disputed this assumption 
or introduced alternative models of the continuum. There are, however, no known 
inconsistencies in treating R as a set of points, and since Cantor’s work it has been 
the dominant point of view in mathematics because of its precision, power, and 
simplicity. 


1.1. Sets 3 


We denote the set of (positive, negative and zero) integers by 
Z= {3 —2,—1,0,1,2,3,::: } 
and the set of rational numbers (ratios of integers) by 
Q = {p/4 : p,q € Z and q F 0}. 


The letter “Z” comes from “zahl” (German for “number”) and “Q” comes from 
“quotient.” These number systems are discussed further in Chapter [B] 


Although we will not develop any complex analysis here, we occasionally make 
use of complex numbers. We denote the set of complex numbers by 


C= {r +iy: x,y ER}, 
where we add and multiply complex numbers in the natural way, with the additional 


identity that i? = —1, meaning that i is a square root of —1. If z = x + iy € C, we 
call x = Rz the real part of z and y = Sz the imaginary part of z, and we call 


Jz] = V2? +y? 


the absolute value, or modulus, of z. Two complex numbers z = z + iy, w=u+iv 
are equal if and only if x = u and y = v. 


1.1.2. Subsets. A set A is a subset of a set X, written A C X or X D A, if 
every element of A belongs to X; that is, if 


x € A implies that x € X. 


We also say that A is included in X || For example, if P is the set of prime numbers, 
then P CN, and N CR. The empty set Ø and the whole set X are subsets of any 
set X. Note that X = Y if and only if X C Y and Y C X; we often prove the 
equality of two sets by showing that each one includes the other. 


In our notation, A C X does not imply that A is a proper subset of X (that 
is, a subset of X not equal to X itself), and we may have A = X. This notation 
for non-strict inclusion is not universal; some authors use A C X to denote strict 
inclusion, in which A #4 X, and A C X to denote non-strict inclusion, in which 
A= X is allowed. 


Definition 1.1. The power set P(X) of a set X is the set of all subsets of X. 
Example 1.2. If X = {1,2,3}, then 
P(X) = {@, {1}, {2}, {3}, {2,3}, {1,3}, {1,2}, {12,3}. 


The power set of a finite set with n elements has 2” elements because, in 
defining a subset, we have two independent choices for each element (does it belong 
to the subset or not?). In Example |1.2| X has 3 elements and P(X) has 2? = 8 
elements. 


The power set of an infinite set, such as N, consists of all finite and infinite 
subsets and is infinite. We can define finite subsets of N, or subsets with finite 


lpy contrast, we say that an element x € X is contained in X, in which cases the singleton set 
{x} is included in X. This terminological distinction is not universal, but it is almost always clear from 
the context whether one is referring to an element of a set or a subset of a set. In fact, before the 
development of the contemporary notation for set theory, Dedekind used the same symbol (C) to 
denote both membership of elements and inclusion of subsets. 


4 1. Sets and Functions 


complements, by listing finitely many elements. Some infinite subsets, such as 
the set of primes or the set of squares, can be defined by giving a definite rule 
for membership. We imagine that a general subset A C N is “defined” by going 
through the elements of N one by one and deciding for each n € N whether n € A 
orn ¢ A. 


If X is a set and P is a property of elements of X, we denote the subset of X 
consisting of elements with the property P by {x € X : P(x)}. 


Example 1.3. The set 
{nE N:n = k? for some k € N} 
is the set of perfect squares {1,4,9,16,25,...}. The set 
{xER:0<a2<1} 
is the open interval (0, 1). 
1.1.3. Set operations. The intersection A N B of two sets A, B is the set of 
all elements that belong to both A and B; that is 
x€ ANB if and only if x € A and z E€ B. 


Two sets A, B are said to be disjoint if AN B = Ø; that is, if A and B have no 
elements in common. 


The union AU B is the set of all elements that belong to A or B; that is 
xe AUB if and only if re Aorxe B. 
Note that we always use ‘or’ in an inclusive sense, so that x € AU B if x is an 
element of A or B, or both A and B. (Thus, AN B C AUB.) 


The set-difference of two sets B and A is the set of elements of B that do not 
belong to A, 


B\A={xrE€B:x¢A}. 


If we consider sets that are subsets of a fixed set X that is understood from the 
context, then we write A° = X \ A to denote the complement of A C X in X. Note 
that (A°)° = A. 


Example 1.4. If 
A = {2,3,5,7,11}, B = {1,3,5,7,9,11} 


then 
AN B = {3,5,7,11}, AUB = {1,2,3,5,7,9,11}. 
Thus, AN B consists of the natural numbers between 1 and 11 that are both prime 


and odd, while AU B consists of the numbers that are either prime or odd (or 
both). The set differences of these sets are 


B\A=1{1,9, A\B={2. 


Thus, B \ A is the set of odd numbers between 1 and 11 that are not prime, and 
A \ B is the set of prime numbers that are not odd. 


1.2. Functions 5 


These set operations may be represented by Venn diagrams, which can be used 
to visualize their properties. In particular, if A, B C X, we have De Morgan’s laws: 


(AUB) = ASN BS, (AN B) = ASU BS. 
The definitions of union and intersection extend to larger collections of sets in 
a natural way. 
Definition 1.5. Let C be a collection of sets. Then the union of C is 
(Jc = {x : x € X for some X € C}, 


and the intersection of C is 
[NC = {x : x € X for every X € C}. 


If C = {A, B}, then this definition reduces to our previous one for A U B and 
ANB. 


The Cartesian product X x Y of sets X, Y is the set of all ordered pairs (x, y) 
with x € X andy € Y. If X = Y, we often write X x X = X?. Two ordered 
pairs (£1, y1), (£2, Y2) in X x Y are equal if and only if zı = z2 and yı = y2. Thus, 
(x,y) # (y,x) unless x = y. This contrasts with sets where {x,y} = {y, £}. 


Example 1.6. If X = {1,2,3} and Y = {4,5} then 

XxY= {(1, 4), (1, 5), (2,4), (2, 5), (3, 4), (3, 5)} : 
Example 1.7. The Cartesian product of R with itself is the Cartesian plane R? 
consisting of all points with coordinates (x, y) where x,y € R. 


The Cartesian product of finitely many sets is defined analogously. 


Definition 1.8. The Cartesian products of n sets X1, Xo,...,Xy is the set of 
ordered n-tuples, 
XiX Xo X- x Xn =de En) 1 LG E€ Xi forr = N 2 ls 


where (£1, £2,...,£n) = (Y1,Y2,---;Yn) if and only if r; = y; for every i = 
T2 oses: 


1.2. Functions 


A function f : X — Y between sets X, Y assigns to each x € X a unique element 
f(x) € Y. Functions are also called maps, mappings, or transformations. The set 
X on which f is defined is called the domain of f and the set Y in which it takes 
its values is called the codomain. We write f : «++ f(x) to indicate that f is the 
function that maps x to f(z). 


Example 1.9. The identity function idx : X —> X on a set X is the function 
idx : x > « that maps every element to itself. 


Example 1.10. Let A C X. The characteristic (or indicator) function of A, 
XA: X => {0, 1}, 


6 1. Sets and Functions 


is defined by 

(a) 1 ifxE€A, 

T) = 

sa 0 ifad A. 
Specifying the function x4 is equivalent to specifying the subset A. 
Example 1.11. Let A, B be the sets in Example [1.4] We can define a function 
f:A4A—> B by 
fQ)=7, fB)=1, f6)=11, {(T)=3, f(11)=9, 
and a function g : B > A by 
g(1)=3, g(3)=7, 9(5)=2, 9(7)=2, g(9)=5, g(11)=11. 
Example 1.12. The square function f : N > N is defined by 
f(n) =n’, 

which we also write as f : n œ n?. The equation g(n) = yn, where y/n is the 
positive square root, defines a function g : N > R, but h(n) = +yn does not define 
a function since it doesn’t specify a unique value for h(n). Sometimes we use a 
convenient oxymoron and refer to h as a multi-valued function. 


One way to specify a function is to explicitly list its values, as in Example [L11] 
Another way is to give a definite rule, as in Example [1.12] If X is infinite and f is 
not given by a definite rule, then neither of these methods can be used to specify 
the function. Nevertheless, we suppose that a general function f : X — Y may be 
“defined” by picking for each « € X a corresponding value f(x) € Y. 


If f: X — Y and U C X, then we denote the restriction of f to U by 
fly :U >Y, where f|y (x)= f(x) for x eU. 


In defining a function f : X —> Y, it is crucial to specify the domain X of 
elements on which it is defined. There is more ambiguity about the choice of 
codomain, however, since we can extend the codomain to any set Z D Y and define 
a function g : X > Z by g(a) = f(x). Strictly speaking, even though f and g 
have exactly the same values, they are different functions since they have different 
codomains. Usually, however, we will ignore this distinction and regard f and g as 
being the same function. 


The graph of a function f : X — Y is the subset Gy of X x Y defined by 
Gs ={(x,y) EXxY:xeEX and y= f(x)}. 


For example, if f : R > R, then the graph of f is the usual set of points (a, y) with 
y = f(x) in the Cartesian plane R?. Since a function is defined at every point in 
its domain, there is some point (x,y) € Gy for every z € X, and since the value of 
a function is uniquely defined, there is exactly one such point. In other words, for 
each x € X the “vertical line” Ly = { (x,y) E€ X x Y : y € Y} through z intersects 
the graph of a function f : X — Y in exactly one point: Le N Gy = (x, f(x)). 


Definition 1.13. The range, or image, of a function f : X — Y is the set of values 
ran f = {y E€ Y : y = f(x) for some z € X}. 
A function is onto if its range is all of Y; that is, if 


for every y € Y there exists x € X such that y = f(x). 


1.3. Composition and inverses of functions 7 


A function is one-to-one if it maps distinct elements of X to distinct elements of 
Y; that is, if 

1,02 E€ X and x; Æ x2 implies that f(x1) 4 f(x). 
An onto function is also called a surjection, a one-to-one function an injection, and 
a one-to-one, onto function a bijection. 
Example 1.14. The function f : A > B defined in Example is one-to-one but 
not onto, since 5 ¢ ran f, while the function g : B —> A is onto but not one-to-one, 
since g(5) = (7). 


1.3. Composition and inverses of functions 


The successive application of mappings leads to the notion of the composition of 
functions. 


Definition 1.15. The composition of functions f : X > Y and g : Y —> Z is the 
function go f : X —> Z defined by 


(g © f)() = g (F(2)). 


The order of application of the functions in a composition is crucial and is read 
from from right to left. The composition go f can only be defined if the domain of 
g includes the range of f, and the existence of go f does not imply that f o g even 
makes sense. 


Example 1.16. Let X be the set of students in a class and f : X — N the function 
that maps a student to her age. Let g : N > N be the function that adds up the 
digits in a number e.g., g(1729) = 19. If x € X is 23 years old, then (go f)(x) = 5, 
but (f o g)(x) makes no sense, since students in the class are not natural numbers. 
Even if both go f and fog are defined, they are, in general, different functions. 
Example 1.17. If f : A— B and g : B — A are the functions in Example 
then go f : A > A is given by 
(go f)(2)=2, (gof)(3)=3, (go f)(5)=11, 
(go f)(7)=7, (go f)(11)=5. 
and fog: B— B is given by 
(fog) =1, (fog)(3)=3, (fog9)(5) =7, 
(fog\(=7, (fog)(9 =, (fo g)(11) =9. 
A one-to-one, onto function f : X — Y has an inverse f~! : Y — X defined by 
f=! (y) = z if and only if f(z) = y. 
Equivalently, f~'o f = idx and fo f~!=idy. A value f~1(y) is defined for every 
y € Y since f is onto, and it is unique since f is one-to-one. If f : X — Y is one- 
to-one but not onto, then one can still define an inverse function f~! : ran f + X 
whose domain in the range of f. 


The use of the notation f~! to denote the inverse function should not be con- 
fused with its use to denote the reciprocal function; it should be clear from the 
context which meaning is intended. 


8 1. Sets and Functions 


Example 1.18. If f : R > R is the function f(x) = x, which is one-to-one and 
onto, then the inverse function f~! : R > R is given by 


fie) = a1”. 
On the other hand, the reciprocal function g = 1/f is given by 


1 
gt)==3, 9g: R\ {0} >R. 
The reciprocal function is not defined at x = 0 where f(x) = 0. 


If f:X —Y and AC X, then we let 
f(A) ={y EY : y = f(x) for some x € A} 
denote the set of values of f on points in A. Similarly, if B C Y, we let 
f-1(B) = {z € X : f(a) € B} 


denote the set of points in X whose values belong to B. Note that f~1(B) makes 
sense as a set even if the inverse function f7! : Y + X does not exist. 


Example 1.19. Define f : R > R by f(x) = xz?. If A = (—2,2), then f(A) = [0, 4). 
If B = (0,4), then 
f= (B) = (-2,0) U (0, 2). 
If C = (—4,0), then f-1(C) = Ø. 
Finally, we introduce operations on a set. 
Definition 1.20. A binary operation on a set X is a function f : X x X > X. 
We think of f as “combining” two elements of X to give another element 


of X. One can also consider higher-order operations, such as ternary operations 
f:X x Xx X > X, but will will only use binary operations. 


Example 1.21. Addition a: N x N > N and multiplication m : N x N > N are 
binary operations on N where 


a(z,y) =£ +y, m(x, y) = ry. 


1.4. Indexed sets 
We say that a set X is indexed by a set J, or X is an indexed set, if there is an 
onto function f : I = X. We then write 
X = {x,:1€ I} 
where x; = f(i). For example, 
{1,4,9,16,...}={n?:neEN}. 


The set X itself is the range of the indexing function f, and it doesn’t depend on 
how we index it. If f isn’t one-to-one, then some elements are repeated, but this 
doesn’t affect the definition of the set X. For example, 


{1,1} = {(-1)” : n € N} = {(-1)"4" :neN}. 


1.4. Indexed sets 9 


IfC = {X; : i € I} is an indexed collection of sets X;, then we denote the union 
and intersection of the sets in C by 


JX ={e: ce X; for some i € I}, ()Xi={a: ve X; for every i € I}, 
ie] ier 
or similar notation. 
Example 1.22. For n € N, define the intervals 
A, = [1/n,1 — 1/n] = {x eR: 1/n<a2<1-1/n}, 
Bn = (—1/n,1/n)= {x E R: —1/n < x < 1/n}). 
Then 
nen n=1 


n=1 nen 
The general statement of De Morgan’s laws for a collection of sets is as follows. 


Proposition 1.23 (De Morgan). If {X; C X :i € I} is a collection of subsets of 


a set X, then (u x) 1x. (A xi) =X. 


tel icl icl icl 


Proof. We have x ¢ Uez Xi if and only if x ¢ X; for every i € I, which holds 
if and only if x € (ler Xf. Similarly, x € (ier Xi if and only if x ¢ X; for some 
i € I, which holds if and only if æ € Ujer Xf. 


The following theorem summarizes how unions and intersections map under 
functions. 


Theorem 1.24. Let f : X —> Y be a function. If {Y; C Y : j € J} is a collection 
of subsets of Y, then 


PUT UNM) =UFt&). FLAY) Hr): 


jeJ JEJ jeJ JEJ 
and if {X; C X : i € I} is a collection of subsets of X, then 

f (Ux) =F X); f (qx) cN). 

i€1 ie ie ie 
Proof. We prove only the results for the inverse image of a union and the image 
of an intersection; the proof of the remaining two results is similar. 
fxe fot (jas Y). then there exists y € Ujes Y; such that f(x) = y. Then 

y € Y; for some j € J and z € f7'(Y;), sore rege (Y;). It follows that 


Fn ae UY). 


JET JET 


10 1. Sets and Functions 


Conversely, if x € U;jez f* (Yj), then x € f~t (Yj) for some j € J, so f(x) € Y; 
and f(a) € Ujes Yj, meaning that x € f7! (West): It follows that 


Oia aaae 


jes jEJ 
which proves that the sets are equal. 


If y € f (Mier Xi), then there exists x € f;er Xi such that f(x) = y. Then 
x € X; and y € f(X;) for every i € I, meaning that y € (),<; f (Xi). It follows 


that 
(A) Gi) FCS). 


ier icl 


The only case in which we don’t always have equality is for the image of an 
intersection, and we may get strict inclusion here if f is not one-to-one. 
Example 1.25. Define f : R — R by f(x) = x?. Let A = (—1,0) and B = (0,1). 
Then AN B = Ø and f(AN B) = Ø, but f(A) = f(B) = (0,1), so f(A) N f(B) = 
(0,1) 4 f(AN B). 

Next, we generalize the Cartesian product of finitely many sets to the product 
of possibly infinitely many sets. 


Definition 1.26. Let C = {X; : i € I} be an indexed collection of sets X;. The 
Cartesian product of C is the set of functions that assign to each index i € J an 
element x; € Xi. That is, 


[Ps ir> U1 6% from ier. 


i€l i€l 
For example, if J = {1,2,...,n}, then f defines an ordered n-tuple of elements 
(£1, £2,..., n) with z; = f(i) € Xj, so this definition is equivalent to our previous 


one. 
If X; = X for every i € I, then [],-; Xi is simply the set of functions from I 
to X, and we also write it as 
X! ={f:I> X}. 
We can think of this set as the set of ordered -tuples of elements of X. 


Example 1.27. A sequence of real numbers (21,22,73,...,2n,---) € RN is a 
function f : N —> R. We study sequences and their convergence properties in 


Chapter 


Example 1.28. Let 2 = {0,1} be a set with two elements. Then a subset A C I 
can be identified with its characteristic function xa : I > 2 by: i € A if and only 
if xa(i) = 1. Thus, A > xa is a one-to-one map from P(I) onto 2. 


Before giving another example, we introduce some convenient notation. 


1.5. Relations 11 


Definition 1.29. Let 
E = { (51,82, 83,---,Sk,---) : Sk = 0,1} 


denote the set of all binary sequences; that is, sequences whose terms are either 0 
or 1. 


Example 1.30. Let 2 = {0,1}. Then © = 2N, where we identify a sequence 
(51, 52;..-Sk,...) with the function f : N —> 2 such that s, = f(k). We can 
also identify © and 2 with P(N) as in Example [1.28] For example, the sequence 
(1,0,1,0,1,...) of alternating ones and zeros corresponds to the function f : N > 2 


defined by 
1 if k is odd, 
f(k) = 


0 if k is even, 


and to the set {1,3,5,7,...} C N of odd natural numbers. 


1.5. Relations 


A binary relation R on sets X and Y is a definite relation between elements of X 
and elements of Y. We write xRy if x € X and y € Y are related. One can also 
define relations on more than two sets, but we shall consider only binary relations 
and refer to them simply as relations. If X = Y, then we call R a relation on X. 


Example 1.31. Suppose that S$ is a set of students enrolled in a university and B 
is a set of books in a library. We might define a relation R on S and B by: 


s € S has read be B. 


In that case, sRb if and only if s has read b. Another, probably inequivalent, 
relation is: 
s € S has checked b € B out of the library. 


When used informally, relations may be ambiguous (did s read b if she only 
read the first page?), but in mathematical usage we always require that relations 
are definite, meaning that one and only one of the statements “these elements are 
related” or “these elements are not related” is true. 


The graph Gp of a relation R on X and Y is the subset of X x Y defined by 
Gr = {(x,y) E€ Xx Y:aRy}. 


This graph contains all of the information about which elements are related. Con- 
versely, any subset G C X xY defines a relation R by: xRy if and only if (x, y) € G. 
Thus, a relation on X and Y may be (and often is) defined as subset of X x Y. As 
for sets, it doesn’t matter how a relation is defined, only what elements are related. 


A function f : X — Y determines a relation F on X and Y by: «Fy if and 
only if y = f(x). Thus, functions are a special case of relations. The graph Gpr of 
a general relation differs from the graph Gpr of a function in two ways: there may 
be elements x € X such that (x,y) € Gr for any y € Y, and there may be x € X 
such that (x,y) € Gr for many y E Y. 


For example, in the case of the relation R in Example|1.31| there may be some 
students who haven’t read any books, and there may be other students who have 


12 1. Sets and Functions 


read lots of books, in which case we don’t have a well-defined function from students 
to books. 


Two important types of relations are orders and equivalence relations, and we 
define them next. 


1.5.1. Orders. A primary example of an order is the standard order < on the 
natural (or real) numbers. This order is a linear or total order, meaning that two 
numbers are always comparable. Another example of an order is inclusion C on the 
power set of some set; one set is “smaller” than another set if it is included in it. 
This order is a partial order (provided the original set has at least two elements), 
meaning that two subsets need not be comparable. 

Example 1.32. Let X = {1,2}. The collection of subsets of X is 

P(X) ={9,A,B,X}, A={l}, B= {2}. 
We have Ø C A C X and Ø C BCX, but AZ Band BZ A, so A and B are not 
comparable under ordering by inclusion. 


The general definition of an order is as follows. 


Definition 1.33. An order < on a set X is a binary relation on X such that for 
every x,y,z E X: 

(a) x < x (reflexivity); 

(b) if x < y and y xx then x = y (antisymmetry); 

(c) if £ < y and y < z then x < z (transitivity). 


An order is a linear, or total, order if for every x,y € X either xz < y or y X a, 
otherwise it is a partial order. 


If < is an order, then we also write y > x instead of x < y, and we define a 
corresponding strict order < by 
xax<xyifexyandrFy. 
There are many ways to order a given set (with two or more elements). 


Example 1.34. Let X be a set. One way to partially order the subsets of X is by 
inclusion, as in Example [1.32] Another way is to say that A < B for A,B C X if 
and only if A D B, meaning that A is “smaller” than B if A includes B. Then x 
in an order on P(X), called ordering by reverse inclusion. 


1.5.2. Equivalence relations. Equivalence relations decompose a set into dis- 
joint subsets, called equivalence classes. We begin with an example of an equivalence 
relation on N. 


Example 1.35. Fix N € N and say that m ~ n if 
m=n (mod N), 


meaning that m — n is divisible by N. Two numbers are related by ~ if they have 
the same remainder when divided by N. Moreover, N is the union of N equivalence 
classes, consisting of numbers with remainders 0, 1,... N — 1 modulo N. 


1.5. Relations 13 


The definition of an equivalence relation differs from the definition of an order 
only by changing antisymmetry to symmetry, but order relations and equivalence 
relations have completely different properties. 


Definition 1.36. An equivalence relation ~ on a set X is a binary relation on X 
such that for every x,y,z € X: 


(a) x ~ z (reflexivity); 
(b) if x ~ y then y ~ x (symmetry); 


c) if x ~ y and y ~ z then z ~ z (transitivity). 
y y y 


For each x € X, the set of elements equivalent to x, 


lz/ ~] = {yE X:2~y}, 
is called the equivalence class of x with respect to ~. When the equivalence relation 
is understood, we write the equivalence class |x/ ~] simply as [z]. The set of 
equivalence classes of an equivalence relation ~ on a set X is denoted by X/ ~. 
Note that each element of X/ ~ is a subset of X, so X/ ~ is a subset of the power 
set P(X) of X. 


The following theorem is the basic result about equivalence relations. It says 
that an equivalence relation on a set partitions the set into disjoint equivalence 
classes. 


Theorem 1.37. Let ~ be an equivalence relation on a set X. Every equivalence 
class is non-empty, and X is the disjoint union of the equivalence classes of ~. 


Proof. If x € X, then the symmetry of ~ implies that x € [æ]. Therefore every 
equivalence class is non-empty and the union of the equivalence classes is X. 


To prove that the union is disjoint, we show that for every x,y € X either 
[7] N [y] = Ø (if x  y) or [æ] = [y] GE x ~ y). 

Suppose that [x] N [y] Æ Ø. Let z € [x] N [y] be an element in both equivalence 
classes. If xı € [a], then zı ~ z and z ~ y, so zı ~ y by the transitivity of ~, and 
therefore xı € [y]. It follows that [x] C [y]. A similar argument applied to yı € [y 
implies that [y] C [x], and therefore [a] = [y]. In particular, y € [x], so £ ~ y. On 
the other hand, if [z] N [y] = Ø, then y ¢ [z] since y € [y], so x % y. 


There is a natural projection 7: X > X/ ~, given by m(x) = [a], that maps 
each element of X to the equivalence class that contains it. Conversely, we can 
index the collection of equivalence classes 


X/ ~= {[a]: ae A} 
by a subset A of X which contains exactly one element from each equivalence class. 
It is important to recognize, however, that such an indexing involves an arbitrary 
choice of a representative element from each equivalence class, and it is better 


to think in terms of the collection of equivalence classes, rather than a subset of 
elements. 


Example 1.38. The equivalence classes of N relative to the equivalence relation 
m ~n ifm =n (mod 3) are given by 


Io = {3,6,9,...}, TL = {1,4,7,...}, Io ={2,5,8,...}. 


14 1. Sets and Functions 


The projection m : N + {Io, I1, I2} maps a number to its equivalence class e.g. 
m(101) = Ig. We can choose {1,2,3} as a set of representative elements, in which 
case 

b=23) =H], b=], 


but any other set A C N of three numbers with remainders 0, 1, 2 (mod 3) will do. 
For example, if we choose A = {7, 15, 101}, then 


h=(15], h=, R= [101]. 


1.6. Countable and uncountable sets 


One way to show that two sets have the same “size” is to pair off their elements. 
For example, if we can match up every left shoe in a closet with a right shoe, with 
no right shoes left over, then we know that we have the same number of left and 
right shoes. That is, we have the same number of left and right shoes if there is a 
one-to-one, onto map f : L > R, or one-to-one correspondence, from the set L of 
left shoes to the set R of right shoes. 


We refer to the “size” of a set as measured by one-to-one correspondences as 
its cardinality. This notion enables us to compare the cardinality of both finite and 
infinite sets. In particular, we can use it to distinguish between “smaller” countably 
infinite sets, such as the integers or rational numbers, and “larger” uncountably 
infinite sets, such as the real numbers. 


Definition 1.39. Two sets X, Y have equal cardinality, written X ~ Y, if there 
is a one-to-one, onto map f : X > Y. The cardinality of X is less than or equal to 
the cardinality of Y, written X < Y, if there is a one-to-one (but not necessarily 
onto) map g: X >Y. 


If X = Y, then we also say that X, Y have the same cardinality. We don’t 
define the notion of a “cardinal number” here, only the relation between sets of 
“equal cardinality.” 


Note that ~% is an equivalence relation on any collection of sets. In particular, 
it is transitive because if X ~ Y and Y ~ Z, then there are one-to-one and onto 
maps f : X —> Y and g : Y > Z, so go f : X > Z is one-to-one and onto, and 
X ~ Z. We may therefore divide any collection of sets into equivalence classes of 
sets with equal cardinality. 


It follows immediately from the definition that < is reflexive and transitive. 
Furthermore, as stated in the following Schröder-Bernstein theorem, if X < Y and 
Y < X, then X ~ Y. This result allows us to prove that two sets have equal 
cardinality by constructing one-to-one maps that need not be onto. The statement 
of the theorem is intuitively obvious but the proof, while elementary, is surprisingly 
involved and can be omitted without loss of continuity. (We will only use the 
theorem once, in the proof of Theorem [5.67]) 


Theorem 1.40 (* Schréder-Bernstein). If X, Y are sets such that there are one- 
to-one maps f : X —> Y and g : Y — X, then there is a one-to-one, onto map 
A: X >Y. 


1.6. Countable and uncountable sets 15 


Proof. We divide X into three disjoint subsets Xx, Xy, Xə with different map- 
ping properties as follows. 


Consider a point xı € X. If zı is not in the range of g, then we say zı E€ Xx. 
Otherwise there exists yı € Y such that g(yi) = x1, and yı is unique since g is 
one-to-one. If yı is not in the range of f, then we say xı E€ Xy. Otherwise there 
exists a unique x2 E€ X such that f(x2) = yı. Continuing in this way, we generate 
a sequence of points 


T1, Y1, T2, Y25---5Un Yn, En41s--- 


with £n € X, Yn € Y and 


glyn) = fn, f(%n41) = Yn. 


We assign the starting point x; to a subset in the following way: (a) zı E€ Xx if 
the sequence terminates at some £n € X that isn’t in the range of g; (b) xı € Xy 
if the sequence terminates at some yn € Y that isn’t in the range of f; (c) 71 € Xx 
if the sequence never terminates. 


Similarly, if yı € Y, then we generate a sequence of points 


Y1, T1, Y2, T2,- Yn, En, Yn+1;- 


with £n € X, Yn E Y by 


f(@n) = Yn, 9 (Yn41) = Tn, 


and we assign yı to a subset Yx, Yy, or Yoo of Y as follows: (a) yı € Yx if the 
sequence terminates at some x, € X that isn’t in the range of g; (b) yı € Yy if the 
sequence terminates at some yn € Y that isn’t in the range of f; (c) y1 € Yoo if the 
sequence never terminates. 


We claim that f : Xx — Yx is one-to-one and onto. First, if £ € Xx, then 
f(x) € Yx because the the sequence generated by f(x) coincides with the sequence 
generated by x after its first term, so both sequences terminate at a point in X. 
Second, if y € Yx, then there is x € X such that f(x) = y, otherwise the sequence 
would terminate at y € Y, meaning that y € Yy. Furthermore, we must have 
x E€ Xx because the sequence generated by x is a continuation of the sequence 
generated by y and therefore also terminates at a point in X. Finally, f is one-to- 
one on Xx since f is one-to-one on X. 


The same argument applied to g : Yy —> Xy implies that g is one-to-one and 
onto, so g~! : Xy — Yy is one-to-one and onto. 


Finally, similar arguments show that f : Xo. — Yə is one-to-one and onto: If 
x € Xa, then the sequence generated by f(x) € Y doesn’t terminate, so f(x) € Ya; 
and every y € Y is the image of a point x € X which, like y, generates a sequence 
that does not terminate, so x € Xə. 


It then follows that h : X — Y defined by 
f(z) if x € Xx 
h(x) = $ g(x) if xe Xy 
f(x) if £ E€ Xoo 


is a one-to-one, onto map from X to Y. 


16 1. Sets and Functions 


We can use the cardinality relation to describe the “size” of a set by comparing 
it with standard sets. 
Definition 1.41. A set X is: 
(1) Finite if it is the empty set or X ~ {1,2,...,n} for some n € N; 
(2) Countably infinite (or denumerable) if X ~ N; 
(3) Infinite if it is not finite; 
(4 
( 


5 


Countable if it is finite or countably infinite; 


) 
) 
) 
) Uncountable if it is not countable. 

We'll take for granted some intuitively obvious facts which follow from the 
definitions. For example, a finite, non-empty set is in one-to-one correspondence 
with {1,2,...,n} for a unique natural number n € N (the number of elements in 
the set), a countably infinite set is not finite, and a subset of a countable set is 
countable. 


According to Definition [1-41] we may divide sets into disjoint classes of finite, 
countably infinite, and uncountable sets. We also distinguish between finite and 
infinite sets, and countable and uncountable sets. We will show below, in Theo- 
rem|[2.19} that the set of real numbers is uncountable, and we refer to its cardinality 
as the cardinality of the continuum. 


Definition 1.42. A set X has the cardinality of the continuum if X ~ R. 


One has to be careful in extrapolating properties of finite sets to infinite sets. 
Example 1.43. The set of squares 
S = {1,4,9,16,...,n?,...} 


is countably infinite since f : N > S defined by f(n) = n? is one-to-one and onto. 
It may appear surprising at first that the set N can be in one-to-one correspondence 
with an apparently “smaller” proper subset $, since this doesn’t happen for finite 
sets. In fact, assuming the axiom of choice, one can show that a set is infinite if 
and only if it has the same cardinality as a proper subset. Dedekind (1888) used 
this property to give a definition infinite sets that did not depend on the natural 
numbers N. 


Next, we prove some results about countable sets. The following proposition 
states a useful necessary and sufficient condition for a set to be countable. 


Proposition 1.44. A non-empty set X is countable if and only if there is an onto 
map f:N> X. 


Proof. If X is countably infinite, then there is a one-to-one, onto map f : N > X. 
If X is finite and non-empty, then for some n € N there is a one-to-one, onto map 
g: {1,2,...,n}— X. Choose any x € X and define the onto map f : N > X by 


glk) ifk=1,2,...,n, 
f(k) = (r) eg. 
x ifk=n+1,n+2,.... 


1.6. Countable and uncountable sets 17 


Conversely, suppose that such an onto map exists. We define a one-to-one, onto 
map g recursively by omitting repeated values of f. Explicitly, let g(1) = f(1). 
Suppose that n > 1 and we have chosen n distinct g-values g(1), 9(2),...,g(n). Let 


An ={k EN: f(k) 4 g(j) for every j = 1,2,...,n} 


denote the set of natural numbers whose f-values are not already included among 
the g-values. If A, = Ø, then g : {1,2,...,n}— X is one-to-one and onto, and X 
is finite. Otherwise, let kn = min An, and define g(n +1) = f (kn), which is distinct 
from all of the previous g-values. Either this process terminates, and X is finite, or 
we go through all the f-values and obtain a one-to-one, onto map g : N > X, and 
X is countably infinite. 


If X is a countable set, then we refer to an onto function f : N > X as an 
enumeration of X, and write X = {x, : n € N}, where zn = f(n). 


Proposition 1.45. The Cartesian product N x N is countably infinite. 


Proof. Define a linear order < on ordered pairs of natural numbers as follows: 
(m,n) < (m',n') if either m+n <m +n! orm+n=m' +n' andn <n. 


That is, we arrange N x N in a table 


(1,1) (1,2) (1,3) (1,4) 
2,1) (2,2) (2,3) (2,4) 
(3,1) (3,2) (3,3) (3,4) 
(4,1) (4,2) (4,3) (4,4) 


and list it along successive diagonals from bottom-left to top-right as 


(1,1), (2,1), (1,2), (3,1), (2,2), (1,3), (4,1), (3,2), (2,3), (1,4), 


We define f : N > N x N by setting f(n) equal to the nth pair in this order; 
for example, f(7) = (4,1). Then f is one-to-one and onto, so N x N is countably 
infinite. 


Theorem 1.46. A countable union of countable sets is countable. 


Proof. Let {Xn : n € N} be a countable collection of countable sets. From Propo- 
sition |1.44| there is an onto map fn : N > Xn. We define 


g:NxN> U Xn 
nen 
by g(n,k) = fn(k). Then g is also onto. From Proposition there is a one-to- 
one, onto map h : N —> N x N, and it follows that 
goh:N> U Xn 
nEN 


is onto, so Proposition implies that the union of the X,, is countable. 


18 1. Sets and Functions 


The next theorem gives a fundamental example of an uncountable set, namely 
the set of all subsets of natural numbers. The proof uses a “diagonal” argument due 
to Cantor (1891), which is of frequent use in analysis. Recall from Definition [1.1] 
that the power set of a set is the collection of all its subsets. 


Theorem 1.47. The power set P(N) of N is uncountable. 


Proof. Let C C P(N) be a countable collection of subsets of N 
C={A, CN: neN}. 

Define a subset A C N by 
A={neEeN:n¢ An}. 


Then A # An for every n € N since either n € A and n ¢ A, orn ¢ A and n E€ An. 
Thus, A ¢ C. It follows that no countable collection of subsets of N includes all of 
the subsets of N, so P(N) is uncountable. 


This theorem has an immediate corollary for the set © of binary sequences 
defined in Definition [1.29 


Corollary 1.48. The set © of binary sequences has the same cardinality as P(N) 
and is uncountable. 


Proof. By Example |1.30| the set © is in one-to-one correspondence with P(N), 
which is uncountable. 


It is instructive to write the diagonal argument in terms of binary sequences. 
Suppose that S = {sn € ©: n € N} is a countable set of binary sequences that 
begins, for example, as follows 


s,=001101... 
gs. =110010... 
s3=110110... 
s,=011000... 
s;=100111... 
s6 = 100100... 


Then we get a sequence s ¢ S by going down the diagonal and switching the values 
from 0 to 1 or from 1 to 0. For the previous sequences, this gives 


s=101101.... 


We will show in Theorem [5.67] below that © and P(N) are also in one-to-one cor- 
respondence with R, so both have the cardinality of the continuum. 

A similar diagonal argument to the one used in Theorem [1.47] shows that for 
every set X the cardinality of the power set P(X) is strictly greater than the 
cardinality of X. In particular, the cardinality of P(P(N)) is strictly greater than 
the cardinality of P(N), the cardinality of P(P(P(N))) is strictly greater than 


1.6. Countable and uncountable sets 19 


the cardinality of P(P(N), and so on. Thus, there are many other uncountable 
cardinalities apart from the cardinality of the continuum. 


Cantor (1878) raised the question of whether or not there are any sets whose 
cardinality lies strictly between that of N and P(N). The statement that there 
are no such sets is called the continuum hypothesis, which may be formulated as 
follows. 


Hypothesis 1.49 (Continuum). If C C P(N) is infinite, then either C ~ N or 
Cx P(N). 


The work of Gédel (1940) and Cohen (1963) established the remarkable result 
that the continuum hypothesis cannot be proved or disproved from the standard 
axioms of set theory (assuming, as we believe to be the case, that these axioms 
are consistent). This result illustrates a fundamental and unavoidable incomplete- 
ness in the ability of any finite system of axioms to capture the properties of any 
mathematical structure that is rich enough to include the natural numbers. 


| 
Chapter 2 


Numbers 


God created the integers and the rest is the work of man. (Leopold 
Kronecker, in an after-dinner speech at a conference, Berlin, 1886) 


“God created the integers and the rest is the work of man.” This maxim 
spoken by the algebraist Kronecker reveals more about his past as a 
banker who grew rich through monetary speculation than about his philo- 
sophical insight. There is hardly any doubt that, from a psychological 
and, for the writer, ontological point of view, the geometric continuum is 
the primordial entity. If one has any consciousness at all, it is conscious- 
ness of time and space; geometric continuity is in some way inseparably 
bound to conscious thought. (René Thom, 1986) 


In this chapter, we describe the properties of the basic number systems. We 
briefly discuss the integers and rational numbers, and then consider the real num- 
bers in more detail. 


The real numbers form a complete number system which includes the rational 
numbers as a dense subset. We will summarize the properties of the real numbers in 
a list of intuitively reasonable axioms, which we assume in everything that follows. 
These axioms are of three types: (a) algebraic; (b) ordering; (c) completeness. The 
completeness of the real numbers is what distinguishes them from the rationals 
numbers and is the essential property for analysis. 


The rational numbers may be constructed from the natural numbers as pairs 
of integers, and there are several ways to construct the real numbers from the ra- 
tional numbers. For example, Dedekind used cuts of the rationals, while Cantor 
used equivalence classes of Cauchy sequences of rational numbers. The real num- 
bers that are constructed in either way satisfy the axioms given in this chapter. 
These constructions show that the real numbers are as well-founded as the natural 
numbers (at least, if we take set theory for granted), but they don’t lead to any 
new properties of the real numbers, and we won’t describe them here. 


22 2. Numbers 


2.1. Integers 


Why then is this view [the induction principle] imposed upon us with such 
an irresistible weight of evidence? It is because it is only the affirmation 
of the power of the mind which knows it can conceive of the indefinite 
repetition of the same act, when that act is once possible. (Poincaré, 
1902) 


The set of natural numbers, or positive integers, is 
N = {1,2,3,...}. 


We add and multiply natural numbers in the usual way. (The formal algebraic 
properties of addition and multiplication on N follow from the ones stated below 
for R.) 

An essential property of the natural numbers is the following induction princi- 
ple, which expresses the idea that we can reach every natural number by counting 
upwards from one. 


Axiom 2.1. Suppose that A C N is a set of natural numbers such that: (a) 1 € A; 
(b) n € A implies (n+ 1) € A. Then A =N. 


This principle, together with appropriate algebraic properties, is enough to 
completely characterize the natural numbers. For example, one standard set of 
axioms is the Peano axioms, first stated by Dedekind [8], but we won’t describe 
them in detail here. 


As an illustration of how induction can be used, we prove the following result 
for the sum of the first n squares, written in summation notation as 


n 
YPS PHPH +n. 
k=1 
Proposition 2.2. For every n € N, 


ere n(n +1)(2n +1). 
k=1 


Proof. Let A be the set of n € N for which this identity holds. It holds for n = 1, 
so 1 € A. Suppose the identity holds for some n € N. Then 


n+1 n 

SOR =) kHH’ 

k=1 k=1 
= an(n-+1)(2n +1) +(n+1) 
= s(n+1) (2n? + 7n + 6) 
s F(n-+1)(n+2)(2n +3). 


It follows that the identity holds when n is replaced by n + 1. Thus n € A implies 
that (n + 1) € A, so A = N, and the proposition follows by induction. 


2.2. Rational numbers 23 


Note that the right hand side of the identity in Proposition [2.2] is always an 
integer, as it must be, since one of n, n + 1 is divisible by 2 and one of n, n + 1, 
2n + 1 is divisible by 3. 


Equations for the sum of the first n cubes, 
7 1 
5 k? = ru +1)?, 
k=1 


and other powers can be proved by induction in a similar way. Another example of a 
result that can be proved by induction is the Euler-Binet formula in Proposition[3.9| 
for the terms in the Fibonacci sequence. 


One defect of such a proof by induction is that although it verifies the result, it 
does not explain where the original hypothesis comes from. A separate argument is 
often required to come up with a plausible hypothesis. For example, it is reasonable 
to guess that the sum of the first n squares might be a cubic polynomial in n. The 
possible values of the coefficients can then be found by evaluating the first few 
sums, after which the general result may be verified by induction. 


The set of integers consists of the natural numbers, their negatives (or additive 
inverses), and zero (the additive identity): 


Z= {...,—3,—2,—1,0,1,2,3,...}. 


We can add, subtract, and multiply integers in the usual way. In algebraic termi- 
nology, (Z,+,-) is a commutative ring with identity. 


Like the natural numbers N, the integers Z are countably infinite. 
Proposition 2.3. The set of integers Z is countably infinite. 


Proof. The function f : N > Z defined by f(1) = 0, and 
fQn)=n, fQn+1)=-n for n > 1, 


is one-to-one and onto. 


The function in the previous proof corresponds to listing the integers as 
0, 1, —1, 2, —2, 3, —3, 4, —4, 5, —5, .... 
Alternatively, but less directly, we can prove Proposition [2.3] by writing 
Z= -Nu {0}UN 
as a countable union of countable sets and applying Theorem [1.46] 


2.2. Rational numbers 


A rational number is a ratio of integers. We denote the set of rational numbers by 


Q={2:n.gezand a0} 


where we may cancel common factors from the numerator and denominator, mean- 
ing that 

ame if and only if pıq2 = p2q1. 
qı q2 


24 2. Numbers 


We can add, subtract, multiply, and divide (except by 0) rational numbers in the 
usual way. In algebraic terminology, (Q,+,-) a field. We state the field axioms 
explicitly for R in Axiom [2.6] below. 


We can construct Q from Z as the collection of equivalence classes in Z x Z\ {0} 
with respect to the equivalence relation (p1, q1) ~ (p2, g2) if pig2 = pag. The usual 
sums and products of rational numbers are well-defined on these equivalence classes. 


The rational numbers are linearly ordered by their standard order, and this 
order is compatible with the algebraic structure of Q. Thus, (Q,+,-,<) is an 
ordered field. Moreover, this order is dense, meaning that if r4, rg E€ Q and r1 < r2, 
then there exists a rational number r € Q between them with rı < r < r2. For 
example, we can take 


1 
r= 5 (ri +72). 


The fact that the rational numbers are densely ordered might suggest that they 
contain all the numbers we need. But this is not the case: they have a lot of “gaps,” 
which are filled by the irrational real numbers. 


The following theorem shows that \/2 is irrational. In particular, the length of 
the hypotenuse of a right-angled triangle with sides of length one is not rational. 
Thus, the rational numbers are inadequate even for Euclidean geometry; they are 
yet more inadequate for analysis. 


The irrationality of v2 was discovered by the Pythagoreans of Ancient Greece 
in the 5th century BC, perhaps by Hippasus of Metapontum. According to one 
legend, the Pythagoreans celebrated the discovery by sacrificing one hundred oxen. 
According to another legend, Hippasus showed the proof to Pythagoras on a boat, 
while they were having a lesson. Pythagoras believed that, like the musical scales, 
everything in the universe could be reduced to ratios of integers and threw Hippasus 
overboard to drown. 


Theorem 2.4. There is no rational number x € Q such that z? = 2. 
Proof. Suppose for contradiction that z? = 2 and x = p/q where p,q € N. By 


canceling common factors, we can assume p and q are relatively prime (that is, the 
only integers that divide both p and q are +1). Since z? = 2, we have 


p = 2¢°, 
which implies that p? is even. Since the square of an odd number is odd, p = 2r 
must be even. Therefore 

m= 
which implies that q = 2s is even. Hence p and q have a common factor of 2, which 
contradicts the initial assumption. 


Theorem [2.4] may raise the question of whether there is a real number x € R 
such that x? = 2. As we will see in Example[7.47] below, there is. 

A similar proof, using the prime factorization of integers, shows that yn is 
irrational for every n € N that isn’t a perfect square. Two other examples of 
irrational numbers are m and e. We will prove in Theorem [4.52] that e is irrational, 
but a proof of the irrationality of m is harder. 


2.3. Real numbers: algebraic properties 25 


The following result may appear somewhat surprising at first, but it is another 
indication that there are not “enough” rationals. 


Theorem 2.5. The rational numbers are countably infinite. 


Proof. The rational numbers are not finite since, for example, they contain the 
countably infinite set of integers as a subset, so we just have to show that Q is 
countable. 


Let Qt = {x € Q: x > 0} denote the set of positive rational numbers, and 
define the onto (but not one-to-one) map 


P 

g:NxN>Q*, Heo = 7 
Let h : N-> Nx N be a one-to-one, onto map, as obtained in Proposition|1.45} and 
define f : N — Qt by f =goh. Then f : N — QF is onto, and Proposition 
implies that Q* is countable. It follows that Q = Q7 U {0} U Q+, where Q7 ~ Q+ 
denotes the set of negative rational numbers, is countable. 


Alternatively, we can write 
Q= |] {p/a : pE Z} 
qEN 


as a countable union of countable sets, and use Theorem As we prove in The- 
orem the real numbers are uncountable, so there are many “more” irrational 
numbers than rational numbers. 


2.3. Real numbers: algebraic properties 


The algebraic properties of R are summarized in the following axioms, which state 
that (R, +,-+) is a field. 


Axiom 2.6. There exist binary operations 
a,m: Rx R> R, 

written a(x, y) = x + y and m(z,y) = x- y = xy, and elements 0,1 € R such that 
for all x,y,z € R: 

(a) x +0 = z (existence of an additive identity 0); 

(b) for every x € R there exists y € R such that x+y = 0 (existence of an additive 
inverse y = — 7); 
) x+ (y +z) = (x +y) + z (addition is associative); 
) £ +y = y +z (addition is commutative); 
e) x1 = x (existence of a multiplicative identity 1); 

) 


for every x € R \ {0}, there exists y € R such that xy = 1 (existence of a 
multiplicative inverse y = x71); 


(g) x(yz) = (xy)z (multiplication is associative); 
(h) xy = yx (multiplication is commutative); 


(i) (x +y)z = zz + yz (multiplication is distributive over addition). 


26 2. Numbers 


Axioms (a)-(d) say that R is a commutative group with respect to addition; 
axioms (e)—(h) say that R \ {0} is a commutative group with respect to multipli- 
cation; and axiom (i) says that addition and multiplication are compatible, in the 
sense that they satisfy a distributive law. 


All of the usual algebraic properties of addition, subtraction (subtracting x 
means adding —x), multiplication, and division (dividing by x means multiplying 
by x~') follow from these axioms, although we will not derive them in detail. The 
natural number n € N is obtained by adding one to itself n times, the integer —n is 
its additive inverse, and p/q = pq~', where p, q are integers with q Æ 0 is a rational 
number. Thus, NCZCQCR. 


2.4. Real numbers: ordering properties 


The real numbers have a natural order relation that is compatible with their alge- 
braic structure. We visualize the ordered real numbers as the real line, with smaller 
numbers to the left and larger numbers to the right. 


Axiom 2.7. There is a strict linear order < on R such that for all x,y,z € R: 
(a) either x < y, £ = y, or £ > y; 
(b) if x < y then z +z <y+z; 
(c) if £z < y and z > 0, then zz < yz. 


For any a,b € R with a < b, we define the open intervals 
(—œ,b)={xrER:x <b}, 
(a,b) ={xrER:a<x<b}, 
(a,œ)={xrER:a< 7T}, 
the closed intervals 
(—œ,b] ={xER:x <b}, 
[a,b] = {x ER:a< x< bd}, 
[a,coo) ={@E R:a<z}, 


and the half-open intervals 
(a,b}={x Ee R:ia<a<bd}, 
la,b) = {r ER:a<x<b}. 


All standard properties of inequalities follow from Axiom and Axiom [2.7] 
For example: if x < y and z < 0, then zz > yz, meaning that the direction of 
an inequality is reversed when it is multiplied by a negative number; and x? > 0 
for every x Æ 0. In future, when we write an inequality such as x < y, we will 
implicitly require that x,y € R. 

Real numbers satisfy many inequalities. A simple, but fundamental, example 
is the following. 


Proposition 2.8. If x,y € R, then 


ay < = (£? +y’), 


1 
2 


2.5. The supremum and infimum 27 


with equality if and only if x = y. 
Proof. We have 

0 < (£ — y)? = x? — day +y’, 
with equality if and only if £ = y, so 2xy < x? + y?. 


On writing x = ya, y = Vb, where a,b > 0, in the result of Proposition [2.8] 
we get that 


Vab < — 


which says that the geometric mean of two nonnegative numbers is less than or 
equal to their arithmetic mean, with equality if and only if the numbers are equal. 
A geometric interpretation of this inequality is that the square-root of the area 
of a rectangle is less than or equal to one-quarter of its perimeter, with equality 
if and only if the rectangle is a square. Thus, a square encloses the largest area 
among all rectangles of a given perimeter, which is a simple form of an isoperimetric 
inequality. 

The arithmetic-geometric mean inequality generalizes to more than two num- 
bers: If n € N and aj, a2,...,@, > 0 are nonnegative real numbers, then 

a, + agtre +a 
(Gidecwta = . — ie 

with equality if and only if all of the a; are equal. For a proof, see e.g., Steele [13]. 


2.5. The supremum and infimum 


Next, we use the ordering properties of R to define the supremum and infimum of 
a set of real numbers. These concepts are of central importance in analysis. In 
particular, in the next section we use them to state the completeness property of 
R. 


First, we define upper and lower bounds. 


Definition 2.9. A set A C R of real numbers is bounded from above if there exists 
a real number M € R, called an upper bound of A, such that x < M for every 
x € A. Similarly, A is bounded from below if there exists m € R, called a lower 
bound of A, such that x > m for every x € A. A set is bounded if it is bounded 
both from above and below. 


Equivalently, a set A is bounded if A C I for some bounded interval I = [m, M]. 


Example 2.10. The interval (0,1) is bounded from above by every M > 1 and 
from below by every m < 0. The interval (—oo,0) is bounded from above by every 
M > 0, but it not bounded from below. The set of integers Z is not bounded from 
above or below. 


If A C R, we define —A C R by 
—-A={yeER:y=-—zx for some x € A}. 


For example, if A = (0,co) consists of the positive real numbers, then —A = 
(—oo, 0) consists of the negative real numbers. A number m is a lower bound of 


28 2. Numbers 


A if and only if M = —m is an upper bound of —A. Thus, every result for upper 
bounds has a corresponding result for lower bounds, and we will often consider only 
upper bounds. 


Definition 2.11. Suppose that A C R is a set of real numbers. If M € R is an 
upper bound of A such that M < M’ for every upper bound M’ of A, then M is 
called the least upper bound or supremum of A, denoted 


M = sup A. 


If m € R is a lower bound of A such that m > m’ for every lower bound m’ of A, 
then m is called the greatest lower bound or infimum of A, denoted 


m = inf A. 


If A = {x; : i € I} is an indexed subset of R, we also write 
sup A = sup ti, inf A = inf zi. 
icI tel 

As an immediate consequence of the definition, we note that the supremum (or 
infimum) of a set is unique if one exists: If M, M’ are suprema of A, then M < M’ 
since M’ is an upper bound of A and M is a least upper bound; similarly, M’ < M, 
so M = M’. Furthermore, the supremum of a nonempty set A is always greater 
than or equal to its infimum if both exist. To see this, choose any x € A. Since inf A 
is a lower bound and sup A is an upper bound of A, we have inf A < x < sup A. 


If sup A € A, then we also denote it by max A and refer to it as the maximum of 
A; and if inf A € A, then we also denote it by min A and refer to it as the minimum 
of A. As the following examples illustrate, sup A and inf A may or may not belong 
to A, so the concepts of supremum and infimum must be clearly distinguished from 
those of maximum and minimum. 


Example 2.12. Every finite set of real numbers 
A = {%1,%2,...,Un} 
is bounded. Its supremum is the greatest element, 
sup A = max{z1,%2,...,2n}, 
and its infimum is the smallest element, 
inf A = min{z1,22,...,2n}. 
Both the supremum and infimum of a finite set belong to the set. 


Example 2.13. If A = (0,1), then every M > 1 is an upper bound of A. The 
least upper bound is M = 1, so 


sup(0, 1) = 1. 
Similarly, every m < 0 is a lower bound of A, so 
inf(0,1) = 0. 


In this case, neither sup A nor inf A belong to A. The set R = (0,1) Q of rational 
numbers in (0, 1), the closed interval B = [0, 1], and the half-open interval C = (0, 1] 
all have the same supremum and infimum as A. Neither sup R nor inf R belong to 
R, while both sup B and inf B belong to B, and only sup C belongs to C. 


2.6. Real numbers: completeness 29 


Example 2.14. Let 


A={7 iment 
n 


be the set of reciprocals of the natural numbers. Then sup A = 1, which belongs to 
A, and inf A = 0, which does not belong to A. 


A set must be bounded from above to have a supremum (or bounded from below 
to have an infimum), but the following notation for unbounded sets is convenient. 
We introduce a system of extended real numbers 


R = {-oo} UR U {oo} 


which includes two new elements denoted —oo and oo, ordered so that —oo < x < co 
for every x € R. 


Definition 2.15. If a set A C R is not bounded from above, then sup A = oo, and 
if A is not bounded from below, then inf A = —oo. 


For example, supN = œo and inf R = —oo. We also define sup Ø = —oo and 
inf Ø = oo, since — by a strict interpretation of logic — every real number is both 
an upper and a lower bound of the empty set. With these conventions, every set 
of real numbers has a supremum and an infimum in R. Moreover, we may define 
the supremum and infimum of sets of extended real numbers in an obvious way; for 
example, sup A = œ if oo € A and inf A= —œ if -weE A. 

While R is linearly ordered, we cannot make it into a field however we extend 
addition and multiplication from R to R. Expressions such as oo — œo or 0: 00 
are inherently ambiguous. To avoid any possible confusion, we will give explicit 
definitions in terms of R alone for every expression that involves +00. Moreover, 
when we say that sup A or inf A exists, we will always mean that it exists as a 
real number, not as an extended real number. To emphasize this meaning, we will 
sometimes say that the supremum or infimum “exists as a finite real number.” 


2.6. Real numbers: completeness 


The rational numbers Q and real numbers R have similar algebraic and order prop- 
erties (they are both densely ordered fields). The crucial property that distinguishes 
R from Q is its completeness. There are two main ways to define the completeness 
of R. The first, which we describe here, is based on the order properties of R and 
the existence of suprema. The second, which we describe in Chapter [3] is based on 
the metric properties of R and the convergence of Cauchy sequences. 


We begin with an example that illustrates the difference between Q and R. 
Example 2.16. Define A C Q by 
A={xEQ:2* <2}. 
Then A is bounded from above by every M € Qt such that M? > 2. Nevertheless, 
A has no supremum in Q because V2 is irrational: for every upper bound M € Q 


there exists M’ € Q such that /2 < M’ < M, so M isn’t a least upper bound of A 
in Q. On the other hand, A has a supremum in R, namely sup A = V2. 


30 2. Numbers 


The following axiomatic property of the real numbers is called Dedekind com- 
pleteness. Dedekind (1872) showed that the real numbers are characterized by the 
condition that they are a complete ordered field (that is, by Axiom [2.6] Axiom[2.7] 


and Axiom (2.17). 


Axiom 2.17. Every nonempty set of real numbers that is bounded from above has 
a supremum. 


Since inf A = —sup(—A) and A is bounded from below if and only if —A 
is bounded from above, it follows that every nonempty set of real numbers that 
is bounded from below has an infimum. The restriction to nonempty sets in Ax- 
iom|[2.17]is necessary, since the empty set is bounded from above, but its supremum 
does not exist. 


As a first application of this axiom, we prove that R has the Archimedean 
property, meaning that no real number is greater than every natural number. 


Theorem 2.18. If x € R, then there exists n € N such that æ < n. 


Proof. Suppose, for contradiction, that there exists x € R such that x > n for every 
n € N. Then z is an upper bound of N, so N has a supremum M = supN E R. 
Since n < M for every n € N, we have n— 1 < M — 1 for every n € N, which 
implies that n < M — 1 for every n € N. But then M — 1 is an upper bound of N, 
which contradicts the assumption that M is a least upper bound. 


By taking reciprocals, we also get from this theorem that for every € > 0 there 
exists n € N such that 0 < 1/n < €. 


These results say roughly that there are no infinite or infinitesimal real num- 
bers. This property is consistent with our intuitive picture of a real line R that 
does not “extend past the natural numbers,” where the natural numbers are ob- 
tained by counting upwards from 1. Robinson (1961) introduced extensions of the 
real numbers, called non-standard real numbers, which form non-Archimedean or- 
dered fields with both infinite and infinitesimal elements, but they do not satisfy 
Axiom [2.17] 


The following proof of the uncountability of R is based on its completeness and 
is Cantor’s original proof (1874). The idea is to show that given any countable set 
of real numbers, there are additional real numbers in the “gaps” between them. 


Theorem 2.19. The set of real numbers is uncountable. 


Proof. Suppose that 

S= ELO e PE EE 
is a countably infinite set of distinct real numbers. We will prove that there is a 
real number x € R that does not belong to S. 


If xı is the largest element of S, then no real number greater than zı belongs to 
S. Otherwise, we select recursively from S an increasing sequence of real numbers 
ak and a decreasing sequence bg as follows. Let a, = xı and choose bi = £n, 
where nı is the smallest integer such that £n, > a1. Then zn ¢ (a1,61) for all 
1 <n <n. Ifa, € (a1, 61) for all n € N, then no real number in (a1, b1) belongs to 
S, and we are done e.g., take x = (a; + 61) /2. Otherwise, choose a2 = £m, where 


2.7. Properties of the supremum and infimum 31 


Mə > nı is the smallest integer such that a1 < £m, < bı. Then zn ¢ (a2, bı) for all 
1 <n <mo. If £n É (a2,b1) for all n € N, we are done. Otherwise, choose bz = £n, 
where nz > mg is the smallest integer such that a2 < £n, < b1. 

Continuing in this way, we either stop after finitely many steps and get an 
interval that is not included in S, or we get subsets {a1,a2,...} and {b1, b2,... } of 
{x1,22,...} such that 

ay < dg <: Sap <i SOR < < ba < bi. 
It follows from the construction that for each n € N, we have £n ¢ (ak, bk) when k 
is sufficiently large. Let 
a = SUP Qk, inf bk = b, 
keN keN 
which exist by the completeness of R. Then a < b (see Proposition below) and 
x ¢ Sif a<a <b, which proves the result. 


This theorem shows that R is uncountable, but it doesn’t show that R has the 
same cardinality as the power set P(N) of the natural numbers, whose uncountabil- 
ity was proved in Theorem [1.47] In Theorem [5.67] we show that R has the same 
cardinality as P(N); this provides a second proof that R is uncountable and shows 
that P(N) has the cardinality of the continuum. 


2.7. Properties of the supremum and infimum 


In this section, we collect some properties of the supremum and infimum for later 
use. This section can be referred back to as needed. 


First, we state an equivalent way to characterize the supremum and infimum, 
which is an immediate consequence of Definition 


Proposition 2.20. If A C R, then M = sup A if and only if: (a) M is an upper 
bound of A; (b) for every M’ < M there exists x € A such that z > M’. Similarly, 
m = inf A if and only if: (a) m is a lower bound of A; (b) for every m’ > m there 
exists x € A such that x < m’. 


We frequently use this proposition as follows: (a) if M is an upper bound of 
A, then sup A < M; (b) if A is nonempty and bounded from above, then for every 
e > 0, there exists x € A such that x > sup A — e. Similarly: (a) if m is a lower 
bound of A, then m < inf A; (b) if A is nonempty and bounded from below, then 
for every € > 0, there exists x € A such that x < inf A + €. 


Making a set smaller decreases its supremum and increases its infimum. In the 
following inequalities, we allow the sup and inf to be extended real numbers. 


Proposition 2.21. Suppose that A, B are subsets of R such that A C B. Then 
sup A < sup B, and inf A > inf B. 


Proof. The result is immediate if B = Ø, when A = Ø, so we may assume that B 
is nonempty. If B is not bounded from above, then sup B = co, so sup A < sup B. 
If B bounded from above, then sup B is an upper bound of B. Since A C B, it 
follows that sup B is an upper bound of A, so sup A < sup B. Similarly, either 
inf B = —oo or inf B is a lower bound of A, so inf A > inf B. 


32 2. Numbers 


The next proposition states that if every element in one set is less than or equal 
to every element in another set, then the sup of the first set is less than or equal to 
the inf of the second set. 


Proposition 2.22. Suppose that A, B are nonempty sets of real numbers such 
that x < y for alla € A and y € B. Then sup A < inf B. 


Proof. Fix y € B. Since x < y for all x € A, it follows that y is an upper bound 
of A, so sup A is finite and sup A < y. Hence, sup A is a lower bound of B, so inf B 
is finite and sup A < inf B. 


If A C R and c € R, then we define 
cA = {y E R : y = cz for some z € A}. 


Multiplication of a set by a positive number multiplies its sup and inf; multiplication 
by a negative number also exchanges its sup and inf. 


Proposition 2.23. If c > 0, then 
sup cA = csup A, inf cA = cinf A. 


If c < 0, then 
sup cA = cinf A, inf cA = csup A. 


Proof. The result is obvious if c = 0. If c > 0, then cx < M if and only if 
x < M/c, which shows that M is an upper bound of cA if and only if M/c is an 
upper bound of A, so supcA = csup A. If c < 0, then then cx < M if and only if 
x > M/c, so M is an upper bound of cA if and only if M/c is a lower bound of A, 
so supcA = cinf A. The remaining results follow similarly. 


If A, B C R, then we define 
A+B={zeER:z=a2+y for some z € A, y € B}, 
A-—B={zE€R:z= r7- y for some z € A, y € B}. 


Proposition 2.24. If A, B are nonempty sets, then 
sup(A + B) = sup A + sup B, inf(A + B) = inf A + inf B, 
sup(A — B) = sup A — inf B, inf(A — B) = inf A — sup B. 
Proof. The set A + B is bounded from above if and only if A and B are bounded 


from above, so sup(A + B) exists if and only if both sup A and sup B exist. In that 
case, if x € A and y € B, then 


x+y < sup A + sup B, 
so sup A+ sup B is an upper bound of A + B, and therefore 
sup(A + B) < sup A + sup B. 


To get the inequality in the opposite direction, suppose that € > 0. Then there 
exist x € A and y € B such that 


x>supA—<, y > sup B— 5. 


2.7. Properties of the supremum and infimum 33 


It follows that 
x+y>supA+supB-e 
for every € > 0, which implies that 
sup(A + B) > sup A+ sup B. 


Thus, sup(A+ B) = sup A+sup B. It follows from this result and Proposition 
that 
sup(A — B) = sup A+ sup(—B) = sup A — inf B. 


The proof of the results for inf(A + B) and inf(A — B) is similar, or we can 
apply the results for the supremum to —A and —B. 


Finally, we prove that taking the supremum over a pair of indices gives the 
same result as taking successive suprema over each index separately. 


Proposition 2.25. Suppose that 
{z :iE 1,7 € J} 
is a doubly-indexed set of real numbers. Then 


sup Tij = Sup (sup zy) ‘ 
(i, j)EIxXJ iel \jJEJ 


Proof. For each a € I, we have {a} x J CI x J, so 
SUP Taj S sup Tij. 
jEJ (i, j)EIXJ 
Taking the supremum of this inequality over a € I, and replacing ‘a’ by ‘i’, we get 
that 
sup (sup) < sup fij. 
ier \jEJ (i j)EIxJ 
To prove the reverse inequality, first note that if 
Sup Tij 
(i, j)EIxJ 
is finite, then given e€ > 0 there exists a € I, b € J such that 
Lab > sup ij — €. 
(i, j)EIxJ 
It follows that 


SUP Taj > SUP Tij — €, 
jEJ G, j)EIXJ 


and therefore that 
sup (sup zy) > SUP Xj—€. 
icI \jEJ (i,j)EIX I 
Since € > 0 is arbitrary, we have 
sup (sup) > sup fij. 
icI \jEJ (i, j)EIxJ 
Similarly, if 


sup Tij = OO, 
(ij)EIxJ 


34 


2. Numbers 


then given M € R there exists a € I, b € J such that xq, > M, and it follows that 


Since M is arbitrary, we have 


which completes the proof. 


iel \jEJ 


sup (sup zy) = O0, 
iEI \jJEJ 


sup (sup 2y) > M. 


| 
Chapter 3 


Sequences 


In this chapter, we discuss sequences. We say what it means for a sequence to 
converge, and define the limit of a convergent sequence. We begin with some 
preliminary results about the absolute value, which can be used to define a distance 
function, or metric, on R. In turn, convergence is defined in terms of this metric. 


3.1. The absolute value 


Definition 3.1. The absolute value of x € R is defined by 


Some basic properties of the absolute value are the following. 
Proposition 3.2. For all z,y € R: 
(a) |z| > 0 and |z| = 0 if and only if x = 0; 


(b) |- z| = |a\; 
(c) ja+y| < |z| + |y| (triangle inequality); 
(d) [xy] = |z] lyl; 


Proof. Parts (a), (b) follow immediately from the definition. Part (c) remains 
valid if we change the signs of both x and y or exchange x and y. Therefore we can 
assume that x > 0 and |z| > |y| without loss of generality, in which case z +y > 0. 
If y > 0, corresponding to the case when x and y have the same sign, then 


|z +y| =x +y = |z| + yl. 


If y < 0, corresponding to the case when x and y have opposite signs and z +y > 0, 
then 


|z +y| = +y = |z| = yl < |z| + lyl, 


36 3. Sequences 


which proves (c). Part (d) remains valid if we change x to —x or y to —y, so we 
can assume that x,y > 0 without loss of generality. Then xy > 0 and |xy| = cy = 


|zllyl. 


One useful consequence of the triangle inequality is the following reverse trian- 
gle inequality. 


Proposition 3.3. If x,y € R, then 


I|z| — |y|| < |æ — yl. 


Proof. By the triangle inequality, 
|z| = |x -y + y| < |z — y| + yl 


so |æ|—|y| < |z — yl. Similarly, exchanging x and y, we get |y|— |æ| < |z — yl, which 
proves the result. 


We can give an equivalent condition for the boundedness of a set by using the 
absolute value instead of upper and lower bounds as in Definition [2.9] 


Proposition 3.4. A set A C R is bounded if and only if there exists a real number 
M > 0 such that 


|x| < M for every z € A. 


Proof. If the condition in the proposition holds, then M is an upper bound of 
A and —M is a lower bound, so A is bounded. Conversely, if A is bounded from 
above by M’ and from below by m’, then |x| < M for every x € A where M = 
max{|m’|,|1’|}. 


A third way to say that a set is bounded is in terms of its diameter. 
Definition 3.5. Let A C R. The diameter of A is 
diam A = sup {|z — y| : x,y € A}. 
Then a set is bounded if and only if its diameter is finite. 


Example 3.6. If A = (—a,a), then diam A = 2a, and A is bounded. If A = 
(—oo, a), then diam A = oo, and A is unbounded. 


3.2. Sequences 


A sequence (£n) of real numbers is an ordered list of numbers £n € R, called the 
terms of the sequence, indexed by the natural numbers n € N. We often indicate a 
sequence by listing the first few terms, especially if they have an obvious pattern. 
Of course, no finite list of terms is, on its own, sufficient to define a sequence. 


3.2. Sequences 37 


ooo00000007 
o000000000 

oo0o000 
o00 


2.6F ae J 
o9 


0 5 10 15 20 25 30 35 40 


Figure 1. A plot of the first 40 terms in the sequence zn = (1+ 1/n)”, 
illustrating that it is monotone increasing and converges to e ~ 2.718, whose 
value is indicated by the dashed line. 


Example 3.7. Here are some sequences: 


1, 8, 27,64,..., En =n, 

iai L 

3 9? 3? 4’ Tn = n 
1, 1, 1, 1, In = (=I), 


ae, 44), (42), ae (ret) 


Note that unlike sets, where elements are not repeated, the terms in a sequence 
may be repeated. 


The formal definition of a sequence is as a function on N, which is equivalent 
to its definition as a list. 


Definition 3.8. A sequence (xn) of real numbers is a function f : N —> R, where 


Ln = f(n). 


We can consider sequences of many different types of objects (for example, 
sequences of functions) but for now we only consider sequences of real numbers, 
and we will refer to them as sequences for short. A useful way to visualize a 
sequence (£n) is to plot the graph of x, € R versus n € N. (See Figure [1] for an 
example.) 


38 3. Sequences 


If we want to indicate the range of the index n € N explicitly, we write the 
sequence as (a,)°.,. Sometimes it is convenient to start numbering a sequence 
from a different integer, such as n = 0 instead of n = 1. In that case, a sequence 
(v,)P2 is a function f : No > R where x, = f(n) and No = {0,1,2,3,...}, and 
similarly for other starting points. 

Every function f : N —> R defines a sequence, corresponding to an arbitrary 
choice of a real number x, € R for each n € N. Some sequences can be defined 
explicitly by giving an expression for the nth terms, as in Example [3.7] others can 
be defined recursively. That is, we specify the value of the initial term (or terms) in 
the sequence, and define £n as a function of the previous terms (#1, £2,...,@n—1). 


A well-known example of a recursive sequence is the Fibonacci sequence (Fn) 
1:1; 2) 3; 5; 8, 13, any 
which is defined by F, = Fə = 1 and 
Fy, = Fn-1 + Fr_2 for n > 3. 


That is, we add the two preceding terms to get the next term. In general, we cannot 
expect to solve a recursion relation to get an explicit expression for the nth term 
in a sequence, but the recursion relation for the Fibonacci sequence is linear with 
constant coefficients, and it can be solved to give an expression for F, called the 
Euler-Binet formula. 


Proposition 3.9 (Euler-Binet formula). The nth term in the Fibonacci sequence 
is given by 
na agle-(-3) |) =P 
"V5 $) |’ 2 


Proof. The terms in the Fibonacci sequence are uniquely determined by the linear 
difference equation 


Fna — Fn-1 — Fn-2 = 0, n > 3, 
with the initial conditions 
F, =1, Fo =1. 
We see that F,, = r” is a solution of the difference equation if r satisfies 
r?—r—1=0, 


which gives 


= 1.61803. 


E a git 


By linearity, 


is a solution of the difference equation for arbitrary constants A, B. This solution 
satisfies the initial conditions Fy, = Fy = 1 if 


B=- 


which proves the result. 


3.3. Convergence and limits 39 


Alternatively, once we know the answer, we can prove Proposition by in- 
duction. The details are left as an exercise. Note that although the right-hand side 
of the equation for F, involves the irrational number v5, its value is an integer for 
every n E€ N. 


The number ¢ appearing in Proposition [3.9] is called the golden ratio. It has 
the property that subtracting 1 from it gives its reciprocal, or 


1 
g-is 


Geometrically, this property means that the removal of a square from a rectangle 
whose sides are in the ratio ¢ leaves a rectangle whose sides are in the same ratio. 
The number ¢ was originally defined in Euclid’s Elements as the division of a line in 
“extreme and mean ratio,” and Ancient Greek architects arguably used rectangles 
with this proportion in the Parthenon and other buildings. During the Renaissance, 
@ was referred to as the “divine proportion.” The first use of the term “golden 
section” appears to be by Martin Ohm, brother of the physicist Georg Ohm, in a 
book published in 1835. 


3.3. Convergence and limits 


Roughly speaking, a sequence (xn) converges to a limit x if its terms x, get arbi- 
trarily close to x for all sufficiently large n. 


Definition 3.10. A sequence (£n) of real numbers converges to a limit x € R, 
written 


x= lim zn, or £n > xrasn—>o, 
noo 


if for every e > 0 there exists N € N such that 
|En- 2] <e for aln >N. 


A sequence converges if it converges to some limit x € R, otherwise it diverges. 


Although we don’t show it explicitly in the definition, N is allowed to depend 
on e€. Typically, the smaller we choose e, the larger we have to make N. One way 
to view a proof of convergence is as a game: If I give you an e > 0, you have to 
come up with an N that “works.” Also note that 7, —> x as n —> œo means the 
same thing as |£n — x| > 0 as n > oo. 


It may appear obvious that a limit is unique if one exists, but this fact requires 
proof. 


Proposition 3.11. If a sequence converges, then its limit is unique. 


Proof. Suppose that (x,,) is a sequence such that zn > x and £n > x’ as n > oo. 


Let € > 0. Then there exist N, N’ € N such that 
|En — z| < for alln > N, 


lan —a/| < for all n > N’. 


NIANIA 


40 3. Sequences 


Choose any n > max{N, N’}. Then, by the triangle inequality, 


€ € 
Jz —2'| < |z- an) + |en= r'| < 5 +5 sE 


Since this inequality holds for all € > 0, we must have |x — 2’| = 0 (otherwise the 
inequality would be false for € = |x — x'|/2 > 0), so x = x’. 


The following notation for sequences that “diverge to infinity” is convenient. 
Definition 3.12. If (x,,) is a sequence then 


lim £n = œ, 
noo 


or £n > CO as n — œ, if for every M € R there exists N € R such that 
In > M for all n > N. 
Also 


lim £n = — 00, 
noo 


or £n > —oo as n — ov, if for every M € R there exists N € R such that 


In < M for all n > N. 


That is, £n + œ as n — co means the terms of the sequence (xn) get arbitrarily 
large and positive for all sufficiently large n, while zn — —oo as n + co means 
that the terms get arbitrarily large and negative for all sufficiently large n. The 
notation £n — +00 does not mean that the sequence converges. 


To illustrate these definitions, we discuss the convergence of the sequences in 


Example 


Example 3.13. The terms in the sequence 


1, 8, 27, 64, ..., in =n 


eventually exceed any real number, so n? — oo as n — oo and this sequence 


does not converge. Explicitly, let M € R be given, and choose N € N such that 
N > M+”. (If -oo < M <1, we can choose N = 1.) Then for all n > N, we have 
n? > N? > M, which proves the result. 


Example 3.14. The terms in the sequence 


111 1 
+ 2 ri 3 > 4 a In =. n 
get closer to 0 as n gets larger, and the sequence converges to 0: 
lim — =0. 
noo 1 


To prove this limit, let € > 0 be given. Choose N € N such that N > 1/e. (Such a 
number exists by the Archimedean property of R stated in Theorem |2.18}) Then 


for all n > N 
1 


n 


ll 
n N 


< E€, 


o| = 


which proves that 1/n > 0 as n > co. 


3.3. Convergence and limits 41 


Example 3.15. The terms in the sequence 


1, ly 1, —1, ... En = (1) H, 


3 Fi 


oscillate back and forth infinitely often between 1 and —1, but they do not approach 
any fixed limit, so the sequence does not converge. To show this explicitly, note 
that for every x € R we have either |x — 1| > 1 or |x + 1| > 1. It follows that there 
is no N € N such that |x, — z| < 1 for all n > N. Thus, Definition fails if 
€ = 1 however we choose x € R, and the sequence does not converge. 


Example 3.16. The convergence of the sequence 


1\? D 1\” 
(1+1), {14+=] , [(14+-=],... t,={1+—-) , 
2 3 n 


illustrated in Figure [I] is less obvious. Its terms are given by 


1 
n 
Ln =a, Om =1+—.. 
As n increases, we take larger powers of numbers that get closer to one. If a > 1 
is any fixed real number, then a” — oo as n — oo so the sequence (a”) does not 
converge (see Proposition [3.31] below for a detailed proof). On the other hand, if 
a = 1, then 1” = 1 for all n € N so the sequence (1”) converges to 1. Thus, there 
are two competing factors in the sequence with increasing n: an —> 1 but n > oo. 
It is not immediately obvious which of these factors “wins.” 


In fact, they are in balance. As we prove in Proposition below, the sequence 


converges with 
1 n 
lim (1 + z) =e, 
n— oo n 


where 2 < e < 3. This limit can be taken as the definition of e ~ 2.71828. 


For comparison, one can also show that 


2 
1\" 1\” 
lim (+5) =], lim (1+3) =O. 
noo n n—-Co n 


In the first case, the rapid approach of an = 1+1/n? to 1 “beats” the slower growth 
in the exponent n, while in the second case, the rapid growth in the exponent n? 
“beats” the slower approach of a, = 1 + 1/n to 1. 


An important property of a sequence is whether or not it is bounded. 


Definition 3.17. A sequence (£n) of real numbers is bounded from above if there 
exists M € R such that zn < M for all n € N, and bounded from below if there 
exists m € R such that x, > m for all n € N. A sequence is bounded if it is 
bounded from above and below, otherwise it is unbounded. 


An equivalent condition for a sequence (£n) to be bounded is that there exists 
M > 0 such that 


lan| < M for all n EN. 


42 3. Sequences 


Example 3.18. The sequence (n°) is bounded from below but not from above, 
while the sequences (1/n) and ((—1)"*1) are bounded. The sequence 


1, —2, 3, —4, 5, —6, ... En = (—1)” tin 


is not bounded from below or above. 


We then have the following property of convergent sequences. 


Proposition 3.19. A convergent sequence is bounded. 


Proof. Let (x,,) be a convergent sequence with limit x. There exists N € N such 
that 


lt, —a| <1 for al n > N. 
The triangle inequality implies that 
|En] < |En — z| + |z| < 1+ |z| for all n > N. 


Defining 
M = max {|z1|,|ro|,.-.,|an|,1+ |z|}, 


we see that |z,| < M for all n € N, so (£n) is bounded. 


Thus, boundedness is a necessary condition for convergence, and every un- 
bounded sequence diverges; for example, the unbounded sequence in Example [3.13] 
diverges. On the other hand, boundedness is not a sufficient condition for conver- 
gence; for example, the bounded sequence in Example [3.15] diverges. 


The boundedness, or convergence, of a sequence (£n); depends only on the 


behavior of the infinite “tails” (£n) y of the sequence, where N is arbitrarily 
large. Equivalently, the sequence (£n); and the shifted sequences (an4n)°4 
have the same convergence properties and limits for every N € N. As a result, 
changing a finite number of terms in a sequence doesn’t alter its boundedness or 
convergence, nor does it alter the limit of a convergent sequence. In particular, the 
existence of a limit gives no information about how quickly a sequence converges 
to its limit. 
Example 3.20. Changing the first hundred terms of the sequence (1/n) from 1/n 
to n, we get the sequence 

1 1 1 
101’ 102’ 103’ ` 
which is still bounded (although by 100 instead of by 1) and still convergent to 


0. Similarly, changing the first billion terms in the sequence doesn’t change its 
boundedness or convergence. 


1, 2, 3, ..., 99, 100, 


We introduce some convenient terminology to describe the behavior of “tails” 
of a sequence, 


Definition 3.21. Let P(x) denote a property of real numbers æ € R. If (£n) isa 
real sequence, then P(2,,) holds eventually if there exists N € N such that P(2,) 
holds for all n > N; and P(zn) holds infinitely often if for every N € N there exists 
n > N such that P(zn) holds. 


3.4. Properties of limits 43 


For example, (£n) is bounded if there exists M > 0 such that |z,| < M 
eventually; and (£n) does not converge to x € R if there exists co > 0 such that 
|En — z| > € infinitely often. 

Note that if a property P holds infinitely often according to Definition 
then it does indeed hold infinitely often: If N = 1, then there exists nı > 1 such 
that P(x,,) holds; if N = nı, then there exists ng > nı such that P(£n,) holds; 
then there exists ng > nz such that P(£n,) holds, and so on. 


3.4. Properties of limits 


In this section, we prove some order and algebraic properties of limits of sequences. 


3.4.1. Monotonicity. Limits of convergent sequences preserve (non-strict) in- 
equalities. 


Theorem 3.22. If (£n) and (yn) are convergent sequences and £n < Yn for all 
n € N, then 


lim £n < lim Yn. 
noo noo 


Proof. Suppose that x, —> x and Yn —> y as n + co. Then for every e > 0 there 


exists P,Q € N such that 
|£ — zn] < for all n > P, 


ly — Yn| < for all n > Q. 


Nl a wl an 


Choosing n > max{ P,Q}, we have 


€ € 
ES te ET- En ae 5 5Y + Yn Yt SY He. 


Since x < y + € for every € > 0, it follows that x < y. 


This result, of course, remains valid if the inequality £n < y, holds only for 
all sufficiently large n. Limits need not preserve strict inequalities. For example, 
1/n > 0 for all n E€ N but limpo 1/n = 0. 

It follows immediately that if (£n) is a convergent sequence with m < £n < M 
for all n € N, then 

m< lim zn < M. 


n— o0 


The following “squeeze” or “sandwich” theorem is often useful in proving the 
convergence of a sequence by bounding it between two simpler convergent sequences 
with equal limits. 


Theorem 3.23 (Sandwich). Suppose that (£n) and (yn) are convergent sequences 
of real numbers with the same limit L. If (zn) is a sequence such that 


Ln Sn Sn for all n € N, 


then (zn) also converges to L. 


44 3. Sequences 


Proof. Let € > 0 be given, and choose P,Q € N such that 
la, —-L| <e foralln> P, lyn -L|<e foralln>Q. 
If N = max{P, Q}, then for alln > N 
—€<a,—-L<2z,-L<y,—-L<e, 


which implies that |zn — L| < e. This prove the result. 


It is essential here that (£n) and (yn) have the same limit. 


Example 3.24. If £n = —1, yn = 1, and zn = (—1)""!, then £n < Zn < Yn for all 
n € N, the sequence (£n) converges to —1 and (yn) converges 1, but (zn) does not 
converge. 


As once consequence, we show that we can take absolute values inside limits. 


Corollary 3.25. If £n 4 x as n + ov, then |z,| > |z| as n + oo. 


Proof. By the reverse triangle inequality, 
0 < ||£n] — |z| | < lan — zl, 


and the result follows from Theorem [3.23 


3.4.2. Linearity. Limits respect addition and multiplication. In proving the 
following theorem, we need to show that the sequences converge, not just get an 
expressions for their limits. 


Theorem 3.26. Suppose that (#,,) and (yn) are convergent real sequences and 
c € R. Then the sequences (c£n), (£n + Yn), and (nyn) converge, and 


lim cz, =c lim zp, 
noo noo 


lim (£n + yn) = lim zn + lim yn, 
noo n— o0 noo 


dim, (Enya) = (jim, en) (im, un) 
Proof. We let 


z= lim znr, y= lim yn. 
noo Nm oo 
The first statement is immediate if c = 0. Otherwise, let € > 0 be given, and choose 
N EN such that 
len — al < 75 for alln >N. 
Then 


|C£n — cx| < € foral n >N, 
which proves that (cxn) converges to cx. 

For the second statement, let € > 0 be given, and choose P,Q € N such that 
€ € 
2 2 
Let N = max{ P, Q}. Then for all n > N, we have 


(fn + yn) — (x +y)| < |En — x| + [yn — y| < €, 
which proves that (£n + Yn) converges to x + y. 


|En — z| < for all n > P, lyn — y| < for all n > Q. 


3.5. Monotone sequences 45 


For the third statement, note that since (£n) and (yn) converge, they are 
bounded and there exists M > 0 such that 


ltnl|,|Yn| < M for alneN 
and |z|,|y| < M. Given € > 0, choose P,Q € N such that 


€ € 
tn — 2] < sy for alln > P, lyn =l < sae for all n > Q, 
and let N = max{ P, Q}. Then for al n > N, 


[EnYn — LY| = |(£n — £)yn + x(yn — y)| 
< |£n — x| [yn] + [2] lyn — yl 
< M (|En — £| + [yn — yl) 
< €, 


which proves that (£nYn) converges to xy. 


Note that the convergence of (£n + Yn) does not imply the convergence of (£n) 
and (yn) separately; for example, take £n = n and yn = —n. If, however, (£n) 
converges then (yn) converges if and only if (£n + Yn) converges. 


3.5. Monotone sequences 


Monotone sequences have particularly simple convergence properties. 


Definition 3.27. A sequence (£n) of real numbers is increasing if 2,41; > £n for 
all n € N, decreasing if t,41 < £n for all n € N, and monotone if it is increasing 
or decreasing. A sequence is strictly increasing if £n+1 > £n, strictly decreasing if 
Zn+1 < Zn, and strictly monotone if it is strictly increasing or strictly decreasing. 


We don’t require a monotone sequence to be strictly monotone, but this us- 
age isn’t universal. In some places, “increasing” or “decreasing” is used to mean 
“strictly increasing” or “strictly decreasing.” In that case, what we call an increas- 
ing sequence is called a nondecreasing sequence and a decreasing sequence is called 
nonincreasing sequence. We’ll use the more easily understood direct terminology. 


Example 3.28. The sequence 
1 25°25 3, 85°35 4; Ay AAS Bee ct 


is monotone increasing but not strictly monotone increasing; the sequence (n?) is 
strictly monotone increasing; the sequence (1/n) is strictly monotone decreasing; 
and the sequence ((—1)”*!) is not monotone. 


Bounded monotone sequences always converge, and unbounded monotone se- 
quences diverge to +00. 


Theorem 3.29. A monotone sequence of real numbers converges if and only if it 
is bounded. If (xn) is monotone increasing and bounded, then 


lim £n = sup{z£n : n € N}, 
noo 


46 3. Sequences 


and if (£n) is monotone decreasing and bounded, then 


lim £n = inf{z, : n € N}. 
n— o0 
Furthermore, if (£„) is monotone increasing and unbounded, then 
lim £n = 00, 
n— oo 


and if (2,) is monotone decreasing and unbounded, then 


lim £n = —oo. 
noo 


Proof. If the sequence converges, then by Proposition it is bounded. 


Conversely, suppose that (£n) is a bounded, monotone increasing sequence. 
The set of terms {£n : n € N} is bounded from above, so by Axiom [2.17] it has a 
supremum 

x= sUp{ £n : n E€ N}. 
Let e > 0. From the definition of the supremum, there exists an N € N such that 
zy > x— e€. Since the sequence is increasing, we have £n > xy for all n > N, and 
therefore £ — € < £n < x. It follows that 


|an —2| <e foral n >N, 
which proves that £n > x as n > œ. 


If (£n) is an unbounded monotone increasing sequence, then it is not bounded 
from above, since it is bounded from below by its first term xı. Hence, for every 
M € R there exists N € N such that zy > M. Since the sequence is increasing, we 
have £n > zy > M for all n > N, which proves that £n —> œ as n > oo. 


The result for a monotone decreasing sequence (£n) follows similarly, or by 
applying the previous result to the monotone increasing sequence (—2;,). 


The fact that every bounded monotone sequence has a limit is another way to 
express the completeness of R. For example, this is not true in Q: an increasing 
sequence of rational numbers that converges to v2 is bounded from above in Q (for 
example, by 2) but has no limit in Q. 

We sometimes use the notation x, f x to indicate that (xn) is a monotone 
increasing sequence that converges to x, and £n | «x to indicate that (zn) is a 
monotone decreasing sequence that converges to x, with a similar notation for 
monotone sequences that diverge to too. For example, 1/n | 0 and n? + œo as 
n —> oo. 


The following propositions give some examples of monotone sequences. In the 
proofs, we use the binomial theorem, which we state without proof. 


Theorem 3.30 (Binomial). If x,y € R and n € N, then 
a. PY nkk n\ n! 
aL -5> (7): ys (o) = aor 


Here, n! =1-2-3----- n and, by convention, 0! = 1. The binomial coefficients 


a 


3.5. Monotone sequences AT 


read “n choose k,” give the number of ways of choosing k objects from n objects, 
order not counting. For example, 


(z +y)? =x? + day +y’, 
(£ +y)? = x£? +3x°y +3ry? +y”, 
(a +y)* = xt + 4a3y + 6x? y? + 4ry? + yt. 


We also recall the sum of a geometric series: if a Æ 1, then 


= art! 


n 

> af = — 
l—a 

k=0 


Proposition 3.31. The geometric sequence (a”)°o, 


2 3 
1, a, af, @?, ..., 


is strictly monotone decreasing if 0 < a < 1, with 
lim a” = 0, 
n— o0 


and strictly monotone increasing if 1 < a < oo, with 


lim a” = œ. 
n— eo 


Proof. If 0 < a < 1, then 0 < a”t! = a - a” < a”, so the sequence (a”) is strictly 
monotone decreasing and bounded from below by zero. Therefore by Theorem 
it has a limit x € R. Theorem implies that 
r= lim a™*! = lim a-a” =a lim a” = az. 
n— oo noo noo 


Since a Æ 1, it follows that x = 0. 


If a > 1, then a”! = a - a” > a”, so (a”) is strictly increasing. Let a = 1 + ô 
where ô > 0. By the binomial theorem, we have 


a” = (1+ ô)” 


2 


=1+nð+ n(n 1)? +--+. +6" 


>14 no. 
Given M > 0, choose N € N such that N > M/6. Then for all n > N, we have 
a” >1l+nd>14+Nd>M, 


SO an > CO as n —> OO. 


The next proposition proves the existence of the limit for e in Example 
Proposition 3.32. The sequence (£n) with 


1 n 
oy = (142) 
n 


is strictly monotone increasing and converges to a limit 2 < e < 3. 


48 3. Sequences 


Proof. By the binomial theorem, 


CDRA NE 


~ ( ~) ( 2) 2 1 
fee ot 1 1 aie eae 
n! n n n n 


Each of the terms in the sum on the right hand side is a positive increasing func- 
tion of n, and the number of terms increases with n. Therefore (æn) is a strictly 
increasing sequence, and x, > 2 for every n > 2. Moreover, since 0 < (1— k/n) < 1 
for 1 < k < n, we have 


at) Sa aS ug 
n al 3) | ni 


Since n! > 2”71 for n > 1, it follows that 


1\” 1 1 1 1 f1—(1/2)"-" 
Ay" EE a a E ET =2 <3 
(+5) a tga +5 | 1—1/2 , 


so (£n) is monotone increasing and bounded from above by a number strictly less 
than 3. By Theorem the sequence converges to a limit 2 < e < 3. 


3.6. The limsup and lim inf 


The lim sup and lim inf allow us to reduce questions about the convergence and lim- 
its of general real sequences to ones about monotone sequences. They are somewhat 
subtle concepts, and after defining them we will consider a number of examples. 


Unlike the limit, the limsup and liminf of every bounded sequence of real 
numbers exist. A sequence converges if and only if its lim sup and lim inf are equal, 
in which case its limit is their common value. Furthermore, a sequence is unbounded 
if and only if at least one of its lim sup or liminf diverges to +00, and it diverges 
to +00 if and only if both its lim sup and liminf diverge to +00. 


In order to define the lim sup and liminf of a sequence (£n) of real numbers, 
we introduce the sequences (yn) and (zn) obtained by taking the supremum and 
infimum, respectively, of the “tails” of (£n): 


Yn = sup {£k : k > n}, Zn =inf{a,:k >n}. 


As n increases, the supremum and infimum are taken over smaller sets, so (yn) is 
monotone decreasing and (zn) is monotone increasing. The limits of these sequences 
are the lim sup and liminf, respectively, of the original sequence. 


3.6. The limsup and lim inf 49 


Definition 3.33. Suppose that (xn) is a sequence of real numbers. Then 


limsupz, = lim yn, Yn = sup {zk : k > n}, 
n= oo n—> o0 

liminfz, = lim Zp, zn =inf{a,:k >n}. 
n— oo noo 


Note that lim sup zn exists and is finite provided that each of the yn is finite 
and (yn) is bounded from below; similarly, liminf x, exists and is finite provided 
that each of the z, is finite and (zn) is bounded from above. We may also write 


lim sup £n = inf (sep vx) ‘ lim inf £n = sup (jn tx) : 
noo nEN \k>n T= 09 nen \k2n 
As for the limit of monotone sequences, it is convenient to allow the lim inf or 
lim sup to be too, and we state explicitly what this means. We have —co < yn < co 
for every n € N, since the supremum of a non-empty set cannot be —oo, but we 
may have yn | —co; similarly, —co < zn < co, but we may have z, t oo. These 
possibilities lead to the following cases. 


Definition 3.34. Suppose that (£n) is a sequence of real numbers and the se- 
quences (Yn), (Zn) of possibly extended real numbers are given by Definition [3.33] 
Then 


lim sup £n = œ if y, = oo for every n EN, 
noo 
lim sup £n = —oo if Yn 4 — as n > œ, 
noo 
lim inf £n = — 0 if zn = —oo for every n EN, 
noo 
lim inf £n = co if zn Poo as n —> oO. 
noo 


In all cases, we have zn < Yn for every n € N, with the usual ordering conven- 
tions for too, and by taking the limit as n — oo, we get that 


liminf £n < lim sup zp. 
n—> o0 n= o0 


We illustrate the definition of the lim sup and lim inf with a number of examples. 


Example 3.35. Consider the bounded, increasing sequence 


1 2 3 1 
=, 5 Te n=1l--. 
0, 2° 3’ 4’ x n 
Defining yn and zn as above, we have 
1 1 1 
Yn =supyl——:k>n-=1, eZ, =infil—-—:k>n?-p =1--, 
k k n 


and both yn | 1 and zn ¢ 1 converge monotonically to the limit 1 of the original 
sequence. Thus, 


lim sup zn = liminfz, = lim zr, = 1. 
noo noo noo 


Example 3.36. Consider the bounded, non-monotone sequence 


jee Oy n= en £n = (—1)"*". 


50 3. Sequences 


SP oo 
14 0 0 06 8 O98 6.06 S66 am 


=i S56 ooo Oo 6 Oo oro 


0 5 10 15 20 25 30 35 40 


Figure 2. A plot of the first 40 terms in the sequence £n = (—1)"*+1(1+1/n) 
in Example The dashed lines show lim sup £n = 1 and liminf zn = —1. 


Then 
Yn = sup {(-1)**? :k> n} =1, 2 =inf {(-1)*t? :k> n} =-1, 
and yn J 1, zn T —1 converge to different limits. Thus, 


lim sup £n = 1, liminf £n = —1. 
n— o0 n—-0o 


The original sequence does not converge, and lim £p is undefined. 


Example 3.37. The bounded, non-monotone sequence 
3 4 5 1 
—=, 5, —--,... a=" [i+— 
i 9 a 3 ? 4’ x ( ) ( + z) 


is shown in Figure 2] We have 


2 


1+1/n if n is odd, 


ns HN = 
y up {zp : k > n} 1 ee 1) if n is even, 


—[1+1/(n+1)] ifn is odd, 


| 


—[1 + 1/n] if n is even, 
and it follows that 
lim sup £n = 1, lim inf zn = —1. 
noo n—> o0 


The limit of the sequence does not exist. Note that infinitely many terms of the 
sequence are strictly greater than lim sup gn, so limsup2z, does not bound any 


3.6. The lim sup and lim inf 51 


“tail” of the sequence from above. However, every number strictly greater than 
lim sup £n eventually bounds the sequence from above. Similarly, lim inf x, does 
not bound any “tail” of the sequence from below, but every number strictly less 
than liminf x, eventually bounds the sequence from below. 


Example 3.38. Consider the unbounded, increasing sequence 
1, 2, 3, 4, 5,... In =n. 
We have 
Yn = sUp{z; : k > n} = œ, Zn =inf{xrk: k> n} =n, 
so the lim sup, liminf and lim all diverge to ov, 


lim sup £n = liminfz, = lim £n = o. 
n—-oo noo noo 


Example 3.39. Consider the unbounded, non-monotone sequence 


n if n is odd 
1, —2, 3, —4, 5,... CaS . j j 
—n ifn is even. 
We have yn = œ and zn = —oo for every n € N, and 
lim sup £n = œ, lim inf £n = —oo. 
noo noo 


The sequence oscillates and does not diverge to either co or —oo, so limz, is 
undefined even as an extended real number. 


Example 3.40. Consider the unbounded non-monotone sequence 
1 : 3, a jas Ln = m erie od; 
3 1/n_ ifn is even, 


Then yn = œ and 


1/n if n even, 
a= 
1/(n+1) ifn odd. 
Therefore 
lim sup £n = œ, liminf £n = 0. 
n— oo noo 


As noted above, the limsup of a sequence needn’t bound any tail of the se- 
quence, but the sequence is eventually bounded from above by every number that 
is strictly greater than the limsup, and the sequence is greater infinitely often 
than every number that is strictly less than the limsup. This property gives an 
alternative characterization of the lim sup, one that we often use in practice. 


Theorem 3.41. Let (£n) be a real sequence. Then 


y = limsup Zz, 
n— 00 


if and only if —oo < y < œœ satisfies one of the following conditions. 


(1) —oo < y < œ and for every € > 0: (a) there exists N € N such that £n < y+e 
for all n > N; (b) for every N € N there exists n > N such that £n > y — €. 


52 3. Sequences 


(2) y = œ and for every M €E R, there exists n € N such that x, > M, i.e., (£n) 
is not bounded from above. 


(3) y = —oo and for every m € R there exists N € N such that £n < m for all 
n > N, i.e., tn > —CO as n > OO. 


Similarly, 


z = liminf ztn 
n— oo 


if and only if —co < z < œ satisfies one of the following conditions. 


(1) —oo < z < co and for every € > 0: (a) there exists N € N such that £n > z—€ 
for all n > N; (b) for every N € N there exists n > N such that £n < z + €. 

(2) z = —œ and for every m € R, there exists n € N such that 2, < m, i.e., (£n) 
is not bounded from below. 


(3) z = œ and for every M € R there exists N € N such that zn > M for all 
n > N, i.e., 1n > œ as n > oœ. 


Proof. We prove the result for lim sup. The result for lim inf follows by applying 
this result to the sequence (—£n). 

First, suppose that y = lim sup zn and —oo < y < co. Then (£n) is bounded 
from above and 

Yn = sup {zk:k> n} 4y as n — oœ. 

Therefore, for every e€ > 0 there exists N € N such that yn < y +€. Since £n < yn 
for all n > N, this proves (la). To prove (1b), let € > 0 and suppose that N € N is 
arbitrary. Since yy > y is the supremum of {xn : n > N}, there exists n > N such 
that £n > yn — € È y — €, which proves (1b). 

Conversely, suppose that —oo < y < oo satisfies condition (1) for the lim sup. 
Then, given any € > 0, (la) implies that there exists N € N such that 


Yn = sup {zk:k> n} <y+e for aln >N, 


and (1b) implies that yn > y — € for all n € N. Hence, |yn — y| < € for all n > N, 
SO Yn > y as n —> œ, which means that y = lim sup zù. 


We leave the verification of the equivalence for y = +00 as an exercise. 


Next we give a necessary and sufficient condition for the convergence of a se- 
quence in terms of its liminf and lim sup. 


Theorem 3.42. A sequence (xn) of real numbers converges if and only if 


lim inf £n = lim sup £n = £ 
noo noo 


are finite and equal, in which case 


lim £n = T. 
noo 


Furthermore, the sequence diverges to oo if and only if 


lim inf £n = lim sup £n = oo 
noo n—- oo 


and diverges to —oo if and only if 


lim inf £n = lim sup £n = — 0 
noo n—- oo 


3.6. The limsup and lim inf 53 


Proof. First suppose that 


liminf £n = lim sup £n = £ 
noo n—- oo 


for some x € R. Then yn | x and Zn Î x as n > co where 
Yn = sup {zk : k > n}, Zn = inf {zk :k >n}. 
Since Zn < £n < Yn, the “sandwich” theorem implies that 


lim: Lr =T: 
noo 


Conversely, suppose that the sequence (£n) converges to a limit « € R. Then 

for every € > 0, there exists N € N such that 
GT-€< Un < ute for all n > N. 
It follows that 
L—ELlZn LYLE for all n > N. 

Therefore Yn, Zn > © as n —> oo, so lim sup £n = liminf £n = T. 

The sequence (xn) diverges to oo if and only if liminf £n = oo, and then 
limsup£n = œ, since liminfz, < limsup gn. Similarly, (xn) diverges to —oo if 
and only if lim sup £n = —oo, and then lim inf £n = —oo. 


If liminf z, 4 limsupz,, then we say that the sequence (zn) oscillates. The 
difference 
lim sup £n — lim inf £n 
provides a measure of the size of the oscillations in the sequence as n — oo. 


Every sequence has a finite or infinite lim sup, but not every sequence has a 
limit (even if we include sequences that diverge to +00). The following corollary 
gives a convenient way to prove the convergence of a sequence without having to 
refer to the limit before it is known to exist. 


Corollary 3.43. Let (£n) be a sequence of real numbers. Then (£n) converges 
with limn+o £n = x if and only if limsup,,_,,, |@n — x| = 0. 


Proof. If limp. £n = x, then limp. |En — z| = 0, so 


limsup|%, — x| = lim |zn — z| = 0. 
n—> oo n—+oo 


Conversely, if lim sup,,_,., |En — x| = 0, then 


0 < liminf |z,, — z| < limsup|z, — z| = 0, 

nN—-0o noo 
so lim inf, 5.0 Zn|En — x| = limsup,,_,,, Zn|En — x| = 0. Theorem implies that 
limn—+oo |En — x| = 0, or limp Tn = 2. 


Note that the condition lim inf, 5. |v, — x| = 0 doesn’t tell us anything about 
the convergence of (£n). 
Example 3.44. Let £n = 1 + (—1)". Then (zn) oscillates between 0 and 2, and 


lim inf £n = 0, lim sup £n = 2. 
noo n—oo 


The sequence is non-negative and its lim inf is 0, but the sequence does not converge. 


54 3. Sequences 


3.7. Cauchy sequences 


Cauchy has become unbearable. Every Monday, broadcasting the known 
facts he has learned over the week as a discovery. I believe there is no 
historical precedent for such a talent writing so much awful rubbish. This 
is why I have relegated him to the rank below us. (Jacobi in a letter to 
Dirichlet, 1841) 


The Cauchy condition is a necessary and sufficient condition for the convergence 
of a real sequence that depends only on the terms of the sequence and not on its 
limit. Furthermore, the completeness of R can be defined by the convergence of 
Cauchy sequences, instead of by the existence of suprema. This approach defines 
completeness in terms of the distance properties of R rather than its order properties 
and generalizes to other metric spaces that don’t have a natural ordering. 


Roughly speaking, a Cauchy sequence is a sequence whose terms eventually get 
arbitrarily close together. 


Definition 3.45. A sequence (£n) of real numbers is a Cauchy sequence if for 
every € > 0 there exists N € N such that 


|Em — In| < € for all m,n > N. 
Theorem 3.46. A sequence of real numbers converges if and only if it is a Cauchy 
sequence. 
Proof. First suppose that (£n) converges to a limit x € R. Then for every «€ > 0 
there exists N € N such that 

€ 
tn — 21 <5 for aln > N. 

It follows that if m,n > N, then 

2m — Ln| < |Em — z| + |e — Tr] < €, 
which implies that (x) is Cauchy. (This direction doesn’t use the completeness of 
R; for example, it holds equally well for sequence of rational numbers that converge 
in Q.) 

Conversely, suppose that (xn) is Cauchy. Then there is Nı € N such that 

Za tn <1 for all m,n > Nj. 

It follows that if n > N1, then 
[tn] < |En — £m 41] + |en, 41] <1 + |eN, +l- 
Hence the sequence is bounded with 
|En] x max {|z1], |x|, erg |en, |, 1+ lzn,+1|} 
Since the sequence is bounded, its lim sup and lim inf exist. We claim they are 

equal. Given € > 0, choose N € N such that the Cauchy condition in Definition]3.45| 


holds. Then 
Ln — E< Im < Tn +E for allm > n>N. 


It follows that for all n > N we have 


In —€<inf{@a#,:m>n}, sup{zm: M> n} < £n +€, 


3.8. Subsequences 55 


which implies that 
sup {zm : M> n} —e<inf{rm: M> n} +e. 
Taking the limit as n —> oo, we get that 


lim sup £n — € < liminf £n + €, 
noo n—-0o 
and since e€ > 0 is arbitrary, we have 
lim sup z, < liminf £n. 
noo n—-0o 
It follows that lim sup £n = liminfz,, so Theorem implies that the sequence 
converges. 


3.8. Subsequences 


A subsequence of a sequence (£n) 
Ti, T2, T3; --+-, Un, --- 
is a sequence (apn, ) of the form 
Enri Cnag Driss oeiy Lips es 
where ni < no < ng < << Nk <.... 


Example 3.47. A subsequence of the sequence (1/n), 


prit 
2 3’ 4? 5’ 
is the sequence (1/k?) 
fe AL 
" A’ 9’? 16° 25° © 
Here, ng = k?. On the other hand, the sequences 
wo T i 1 a ol 
1, l, 9? 3? J’ poas 9? 1, 3? 4’? BO 


aren’t subsequences of (1/n) since ng is not a strictly increasing function of k in 
either case. 


The standard short-hand notation for subsequences used above is convenient 
but not entirely consistent, and the notion of a subsequence is a bit more involved 
than it might appear at first sight. To explain it in more detail, we give the formal 
definition of a subsequence as a function on N. 


Definition 3.48. Let (x,,) be a sequence, where £n = f(n) and f: N > R. A 
sequence (yz), where yp = g(k) and g : N > R, is a subsequence of (xn) if there is 
a strictly increasing function ¢: N > N such that g = fod. In that case, we write 
olk) = nk and Yk = Tn,- 

Example 3.49. In Example [3.47} the sequence (1/n) corresponds to the function 
f(n) = 1/n and the subsequence (1/k?) corresponds to g(k) = 1/k?. Here, g = fod 
with ọ(k) = k?. 


Note that since the indices in a subsequence form a strictly increasing sequence 
of integers (nz), it follows that ng —> oo as k > oo. 


56 3. Sequences 


Proposition 3.50. Every subsequence of a convergent sequence converges to the 
limit of the sequence. 


Proof. Suppose that (£n) is a convergent sequence with limp. £n = x and (an, ) 
is a subsequence. Let € > 0. There exists N € N such that |x, — x| < e for all 
n > N. Since nz > œ as k + œ, there exists K € N such that n > Nifk > K. 
Then k > K implies that |r, — £| < €, so limpo0 fn, = T. 


A useful criterion for the divergence of a sequence follows immediately from 
this result and the uniqueness of limits. 


Corollary 3.51. If a sequence has subsequences that converge to different limits, 
then the sequence diverges. 


Example 3.52. The sequence ((—1)"**), 
|e io res es N 
has subsequences (1) and (—1) that converge to different limits, so it diverges. 
In general, we define the limit set of a sequence to be the set of all limits of its 
convergent subsequences. 
Definition 3.53. The limit set of a sequence (æn) is the set 
{x € R : there is a subsequence (Zp, ) such that £n, > x as k > oo} 


of limits of all of its convergent subsequences. 


The limit set of a convergent sequence consists of a single point, namely its 
limit. 
Example 3.54. The limit set of the divergent sequence ((—1)"*"), 
1, -1, 1, -1, 1, ..., 
contains two points, and is {—1, 1}. 


Example 3.55. Let {rn : n € N} be an enumeration of the rational numbers 
in [0,1]. Every x € [0,1] is a limit of a subsequence (r,,). To obtain such a 
subsequence recursively, choose nı = 1, and for each k > 2 choose a rational 
number rn, such that |x —rp,| < 1/k and nk > npg-1. This is always possible since 
the rational numbers are dense in [0,1] and every interval contains infinitely many 
terms of the sequence. Conversely, if rn, —> x, then 0 < x < 1 since 0 < rp, < 1. 
Thus, the limit set of (rn) is the interval [0, 1]. 


Finally, we state a characterization of the lim sup and liminf of a sequence in 
terms of of its limit set, where we use the usual conventions about +00. We leave 
the proof as an exercise. 


Theorem 3.56. Suppose that (£n) is sequence of real numbers with limit set S. 
Then 


lim sup £n = sup S, lim inf £n = inf S. 
n= oo no 


3.9. The Bolzano-Weierstrass theorem 57 


3.9. The Bolzano-Weierstrass theorem 


The Bolzano-Weierstrass theorem is a fundamental compactness result. It allows 
us to deduce the convergence of a subsequence from the boundedness of a sequence 
without having to know anything specific about the limit. In this respect, it is anal- 
ogous to the result that a monotone increasing sequence converges if it is bounded 
from above, and it also provides another way of expressing the completeness of R. 


Theorem 3.57 (Bolzano-Weierstrass). Every bounded sequence of real numbers 
has a convergent subsequence. 


Proof. Suppose that (£n) is a bounded sequence of real numbers. Let 


M = sup zn, m = inf £n, 
nen neN 


and define the closed interval Ig = [m, M]. 
Divide Ig = Lo U Ro in half into two closed intervals, where 


Lo = [m, (m+ M)/2], Ro = [(m + M)/2, M]. 


At least one of the intervals Lo, Ro contains infinitely many terms of the sequence, 
meaning that 2, € Lo or £n € Ro for infinitely many n € N (even if the terms 
themselves are repeated). 


Choose I, to be one of the intervals Lo, Ro that contains infinitely many terms 
and choose nı € N such that zn, € L. Divide I; = Lı U Rı in half into two 
closed intervals. One or both of the intervals L1, Rı contains infinitely many terms 
of the sequence. Choose Iz to be one of these intervals and choose ng > nı such 
that £n, € Ig. This is always possible because I> contains infinitely many terms 
of the sequence. Divide I> in half, pick a closed half-interval I3 that contains 
infinitely many terms, and choose ng > na such that £n, € I3. Continuing in this 
way, we get a nested sequence of intervals I) D Ig D Ig D ...Ik D ... of length 
|I| = 2-*(M — m), together with a subsequence (£n,) such that £n, € Ip. 


Let € > 0 be given. Since |I| — 0 as k — ov, there exists K € N such 
that |I| < € for all k > K. Furthermore, since zn, € Ix for all k > K we have 
|En; — €n,| < € for all j,k > K. This proves that (£n,) is a Cauchy sequence, and 
therefore it converges by Theorem [3.46] 


The subsequence obtained in the proof of this theorem is not unique. In partic- 
ular, if the sequence does not converge, then for some k € N both the left and right 
intervals Ly, and Rx contain infinitely many terms of the sequence. In that case, we 
can obtain convergent subsequences with different limits, depending on our choice 
of Ly, or Ry. This loss of uniqueness is a typical feature of compactness arguments. 


We can, however, use the Bolzano-Weierstrass theorem to give a criterion for 
the convergence of a sequence in terms of the convergence of its subsequences. It 
states that if every convergent subsequence of a bounded sequence has the same 
limit, then the entire sequence converges to that limit. 


Theorem 3.58. If (£n) is a bounded sequence of real numbers such that every 
convergent subsequence has the same limit x, then (£n) converges to x. 


58 3. Sequences 


Proof. We will prove that if a bounded sequence (£n) does not converge to x, then 
it has a convergent subsequence whose limit is not equal to x. 


If (£n) does not converges to x then there exists co > 0 such that |x, — | > €o 
for infinitely many n € N. We can therefore find a subsequence (£n, ) such that 
lan, — z| > €o 
for every k € N. The subsequence (x£n,) is bounded, since (xp) is bounded, so by 
the Bolzano-Weierstrass theorem, it has a convergent subsequence (Eng, ). If 


ee Tne, 5Y 


then |x — y| > €9, so x Æ y, which proves the result. 


E 
Chapter 4 


Series 


Divergent series are the devil, and it is a shame to base on them 
any demonstration whatsoever. (Niels Henrik Abel, 1826) 


This series is divergent, therefore we may be able to do something 
with it. (Oliver Heaviside, quoted by Kline) 


In this chapter, we apply our results for sequences to series, or infinite sums. 
The convergence and sum of an infinite series is defined in terms of its sequence of 
finite partial sums. 


4.1. Convergence of series 


A finite sum of real numbers is well-defined by the algebraic properties of R, but in 
order to make sense of an infinite series, we need to consider its convergence. We 
say that a series converges if its sequence of partial sums converges, and in that 
case we define the sum of the series to be the limit of its partial sums. 


Definition 4.1. Let (an) be a sequence of real numbers. The series 
OO 


an 


n=1 


converges to a sum S €E R if the sequence (S,,) of partial sums 


Sn = Xar 


k 


n 


= 


converges to S as n + oo. Otherwise, the series diverges. 


If a series converges to S, we write 


S= 3 an- 
n=l 


60 4. Series 


We also say a series diverges to too if its sequence of partial sums does. As for 
sequences, we may start a series at other values of n than n = 1 without changing 
its convergence properties. It is sometimes convenient to omit the limits on a series 
when they aren’t important, and write it as ` an. 


Example 4.2. If |a| < 1, then the geometric series with ratio a converges and its 


sum 1S 
lore) 
Desr 
a = 
l-a 
n=0 


This series is simple enough that we can compute its partial sums explicitly, 


m 1 — art! 
Sn = Pacl, 
doa l-a 
k=0 


As shown in Proposition [3.31 if |a| < 1, then a” — 0 as n —> œœ, so that Sn —> 
1/(1 — a), which proves the result. 

The geometric series diverges to oo if a > 1, and diverges in an oscillatory 
fashion if a < —1. The following examples consider the cases a = +1 in more 
detail. 


Example 4.3. The series 


Sait isi. 


n=l 
diverges to oo, since its nth partial sum is Sn = n. 


Example 4.4. The series 


co 
So(-prtt =1-14+1-1+... 
n=1 

diverges, since its partial sums 


Pps t if n is odd, 


0 if n is even, 


oscillate between 0 and 1. 


This series illustrates the dangers of blindly applying algebraic rules for finite 
sums to series. For example, one might argue that 


S=(1-1)+(1-1)+(1-1)4+---=04+0+40+4+---=0, 

or that 
S=1+(-14+1)+(-14+1)+---=1+0+0+---=1, 

or that 
1—S=1-(1-141-14...)=1-141-141-.-.-=§, 


so 2S = 1 or S=1/2. 

The Italian mathematician and priest Luigi Grandi (1710) suggested that these 
results were evidence in favor of the existence of God, since they showed that it 
was possible to create something out of nothing. 


4.1. Convergence of series 61 


Telescoping series of the form 


oo 
yo (an _ An+1) 
n=1 


are another class of series whose partial sums 
Sn = Q1 — An+1 


can be computed explicitly and then used to study their convergence. We give one 
example. 


Example 4.5. The series 


co 


3 A ee es F 
L n(n+1) 12 2:3 34 4-5 0 


converges to 1. To show this, we observe that 
1 1 1 


n(n+1) n n+?’ 


SO 


k=1 k=1 
o1 11 tt S l 1 
“L 2'°2 3° 3 n n+l 
=1— 1 
n+1 
and it follows that 
= 1 
Rk a 
fa Elk + 1) 


A condition for the convergence of series with positive terms follows immedi- 
ately from the condition for the convergence of monotone sequences. 


Proposition 4.6. A series X` an with positive terms an > 0 converges if and only 


if its partial sums 
n 
Svan <M 
k=1 
are bounded from above, otherwise it diverges to oo. 


Proof. The partial sums S,, = )77_, a of such a series form a monotone increasing 
sequence, and the result follows immediately from Theorem 


Although we have only defined sums of convergent series, divergent series are 
not necessarily meaningless. For example, the Cesàro sum C of a series X` an is 
defined by 


n 


1 
C= lim =X Sn, Sn = 01 tag +e + an. 


62 4. Series 


That is, we average the first n partial sums the series, and let n — oo. One can 
prove that if a series converges to S, then its Cesaro sum exists and is equal to S, 
but a series may be Cesaro summable even if it is divergent. 


Example 4.7. For the series }>(—1)"*1 in Example[4.4| we find that 
1S 5 _ J1/2+1/(2n)_ ifn is odd, 
oe aa 1/2 if n is even, 


since the S,,’s alternate between 0 and 1. It follows the Cesaro sum of the series is 
C = 1/2. This is, in fact, what Grandi believed to be the “true” sum of the series. 


Cesaro summation is important in the theory of Fourier series. There are also 
many other ways to sum a divergent series or assign a meaning to it (for example, 
as an asymptotic series), but we won’t discuss them further here. 


4.2. The Cauchy condition 


The following Cauchy condition for the convergence of series is an immediate con- 
sequence of the Cauchy condition for the sequence of partial sums. 


Theorem 4.8 (Cauchy condition). The series 
d an 
n=1 


converges if and only for every € > 0 there exists N € N such that 


n 


yo 


k=m+1 


= |am+1 + am42 +++: + an| <€ for alln >m>WN. 


Proof. The series converges if and only if the sequence (S) of partial sums is 
Cauchy, meaning that for every e > 0 there exists N such that 
n 


Ya 


k=m+1 


ISn — Sm| = <E for alln > m> N, 


which proves the result. 


A special case of this theorem is a necessary condition for the convergence of 
a series, namely that its terms approach zero. This condition is the first thing to 
check when considering whether or not a given series converges. 


Theorem 4.9. If the series = 
S an 
n=1 
converges, then 
lim a, = 0. 
n—->oo 


Proof. If the series converges, then it is Cauchy. Taking m = n — 1 in the Cauchy 
condition in Theorem|4.8} we find that for every e > 0 there exists N € N such that 
lan| < € for all n > N, which proves that a, — 0 as n > oo. 


4.2. The Cauchy condition 63 


Example 4.10. The geometric series ` a” converges if |a| < 1 and in that case 
a” —> 0 as n > oo. If |a| > 1, then a” A 0 as n > œœ, which implies that the series 
diverges. 


The condition that the terms of a series approach zero is not, however, sufficient 
to imply convergence. The following series is a fundamental example. 


Example 4.11. The harmonic series 


[oe) 


ete ey 
no D ae 


n=1 


diverges, even though 1/n —> 0 as n — co. To see this, we collect the terms in 
successive groups of powers of two, 


ies 1 1 1 1 1 1 PES 1\, 
an 2 \3 4 5 6 7'8 9' 10` 16) ` 
aaa a tk. a EEE fs 
2 4° A 8 8 8 8 16 16 16 
sd 1, O E bk, 
eee 
In general, for every n > 1, we have 
2H n git 1 
3a o 
k=1 j=1 k=2541 
oe io | 
i D a 
j=l k=2i+1 
1 Sil 
Pe Ts 
j= 
ie 
2'2 


so the series diverges. We can similarly obtain an upper bound for the partial sums, 


ae Pes, 5. 3 
k=1 j=l k=2541 


These inequalities are rather crude, but they show that the series diverges at a 
logarithmic rate, since the sum of 2” terms is of the order n. This rate of divergence 
is very slow. It takes 12367 terms for the partial sums of harmonic series to exceed 
10, and more than 1.5 x 1043 terms for the partial sums to exceed 100. 


A more refined argument, using integration, shows that 


: “1 
im > {be =y 


where y ~ 0.5772 is the Euler constant. (See Example|12.45}) 


64 4. Series 


4.3. Absolutely convergent series 


There is an important distinction between absolutely and conditionally convergent 
series. 


Definition 4.12. The series we 
Doan 
n=l 


converges absolutely if 
co 


X |a,| converges, 
n=1 
and converges conditionally if 


co Co 
J an converges, but > lan | diverges. 


n=1 n=1 


We will show in Proposition [4-17] below that every absolutely convergent series 
converges. For series with positive terms, there is no difference between convergence 
and absolute convergence. Also note from Proposition that }>a, converges 
absolutely if and only if the partial sums `}; |ax| are bounded from above. 


Example 4.13. The geometric series > a” is absolutely convergent if |a| < 1. 
Example 4.14. The alternating harmonic series, 


 (-1)"*1 1 1 1 
o a 


n 2 3 4 


n=1 
is not absolutely convergent since, as shown in Example [4.11] the harmonic series 
diverges. It follows from Theorem [4.30] below that the alternating harmonic series 
converges, so it is a conditionally convergent series. Its convergence is made possible 
by the cancelation between terms of opposite signs. 


As we show next, the convergence of an absolutely convergent series follows 
from the Cauchy condition. Moreover, the series of positive and negative terms 
in an absolutely convergent series converge separately. First, we introduce some 
convenient notation. 


Definition 4.15. The positive and negative parts of a real number a € R are given 


by 
4 a ifa>0O, _ 0 ifa>0, 
a = a= 
0 ifa<0, la] ifa <0. 


It follows, in particular, that 
<2 o> < jal, a=at—-a, la| =at +a. 
We may then split a series of real numbers into its positive and negative parts. 


Example 4.16. Consider the alternating harmonic series 


D i 1 1 Lt 1 
dn = Faas 
om 2 3 4 5 6 


4.3. Absolutely convergent series 65 


Its positive and negative parts are given by 


= 1 1 
Scat =140+5+0+=404..., 


= 3 5 

= 1 1 1 
Sg = 0 gO lak ao 
n=1 


Both of these series diverge to infinity, since the harmonic series diverges and 
+ = 
See y eeey | 
n=1 n=1 n=l 
Proposition 4.17. An absolutely convergent series converges. Moreover, 
OO 
2 an 
n=1 
converges absolutely if and only if the series 
Co Co 
doar Doan 
n=l n=1 


of positive and negative terms both converge. Furthermore, in that case 


lee) oo ee) lee) lee) lee) 
= F = = + = 

2 an=) an n Di lanl= doar + dan 

n=1 n=1 n=1 n=1 n=1 n=1 


Proof. If X` an is absolutely convergent, then J` |an| is convergent, so it satisfies 
the Cauchy condition. Since 


n 


ya 


k=m+1 


n 


< >> leah 


k=m+1 


the series X` an also satisfies the Cauchy condition, and therefore it converges. 
For the second part, note that 


n 


0< 5 \ax| = y ap + DA az, 


k=m+1 k=m+1 k=m+1 
n n 
+ 
0< X ! ak < X l Jax, 
k=m+1 k=m-+1 
n n 
02°)" a, = YS lau), 
k=m-+1 k=m+1 


which shows that X` |an| is Cauchy if and only if both $` af}, SS a; are Cauchy. 
It follows that X` |a,| converges if and only if both X af, Soa, converge. In that 


66 4. Series 


case, we have 


Co n 
) an = lim J ak 
noo 
n=1 k=1 
n nm 
+ peen 
= lim > al — > a 
k=1 k=1 
n n 
= lim > a} — lim a, 
noo noo 
k= = 
=Soat D ap» 
n=1 


and similarly for $` |an|, which proves the proposition. 


It is worth noting that this result depends crucially on the completeness of R. 


Example 4.18. Suppose that a*,a> € QF are positive rational numbers such that 


a ya =2- 2 
n=1 n=1 
and let an = a} — a}. Then 
ya 3 desa =2V2-2¢Q, 
n=1 n=l n=1 
Sladl= Yak a, =2€Q. 
n=1 n=1 n=1 


Thus, the series converges absolutely in Q, but it doesn’t converge in Q. 


4.4. The comparison test 


One of the most useful ways of showing that a series is absolutely convergent is to 
compare it with a simpler series whose convergence is already known. 


Theorem 4.19 (Comparison test). Suppose that bn > 0 and 


converges. If |an| < bn, then 


converges absolutely. 


Proof. Since J` bn converges it satisfies the Cauchy condition, and since 


n 


S > jaxl< So bp 


k=m+1 k=m+1 


4.4. The comparison test 67 


the series J` |an| also satisfies the Cauchy condition. Therefore X` a, converges 
absolutely. 


Example 4.20. The series 


“1 1 1 1 
n=1 


converges by comparison with the telescoping series in Example [4.5] We have 
co co 
1 1 
aie | Ss 
2 mAT Ds (n+ 1) 
n=1 n=1 


and 


1 1 
0< < ; 
~ (n+1)} ` n(n+1) 


We also get the explicit upper bound 


=. ih = 1 
2 ! D D 


In fact, the sum is 


Mengoli (1644) posed the problem of finding this sum, which was solved by Euler 
(1735). The evaluation of the sum is known as the Basel problem, perhaps after 
Euler’s birthplace in Switzerland. 


Example 4.21. The series in Example is a special case of the following series, 
called the p-series, 


n=1 
where 0 < p < co. It follows by comparison with the harmonic series in Exam- 
ple [4.11] that the p-series diverges for p < 1. (If it converged for some p < 1, then 
the harmonic series would also converge.) On the other hand, the p-series converges 
for every 1 < p < œœ. To show this, note that 
1 1 2 1 1 1 1 4 
2P 3p - 2p’ 4P? 5P 6 P 4p’ 


and so on, which implies that 


N 
A ie ea] 1 1 1 
eS ne <it Qp-1 ` 4gp-1 + gp-1 ea 2(N—-1)(p—1) < 1—217p' 


n=1 


Thus, the partial sums are bounded from above, so the series converges by Propo- 
sition An alternative proof of the convergence of the p-series for p > 1 and 
divergence for 0 < p < 1, using the integral test, is given in Example]12.44 


68 4. Series 


4.5. * The Riemann ¢-function 
Example justifies the following definition. 


Definition 4.22. The Riemann ¢-function is defined for 1 < s < co by 


co 


((s)= 2 = 


n=1 


For instance, as stated in Example we have ¢(2) = 17/6. In fact, Euler 
(1755) discovered a general formula for the value ¢(2n) of the ¢-function at even 
natural numbers, 

(27)?” Bon 

2(2n)! ” 
where the coefficients Bən are the Bernoulli numbers (see Example |10.19). In 
particular, 


¢(2n) = (-1)"** n=1,2,3,..., 


rt rê 7? gt 


(4) = gp S06) = zg CO = go S10) = gag 


On the other hand, the values of the ¢-function at odd natural numbers are harder 
to study. For instance, 


n3 


x 1 
¢(3) = Š = = 1.2020569... 
n=1 


is called Apéry’s constant. It was proved to be irrational by Apéry (1979) but a 
simple explicit expression for ¢(3) is not known (and likely doesn’t exist). 

The Riemann ¢-function is intimately connected with number theory and the 
distribution of primes. Every positive integer n has a unique factorization 


— papa Qk 
n = pi Ps” .--Py", 


where the pj are primes and the exponents œj are positive integers. Using the 
binomial expansion in Example [4.2} we have 


( T l l1 1 1 
1- — =14 t =- -4 =a eee 
ps ps ps pes pts 


By expanding the products and rearranging the resulting sums, one can see that 
1\ 7i 
c)=T]-5) 


where the product is taken over all primes p, since every possible prime factorization 
of a positive integer appears exactly once in the sum on the right-hand side. The 
infinite product here is defined as a limit of finite products, 


LY 1\ 
I (:-5) = lim (1-5) . 
p A p 


4.6. The ratio and root tests 69 


Using complex analysis, one can show that the ¢-function may be extended 
in a unique way to an analytic (i.e., differentiable) function of a complex variable 
s=atitec 

¢:C\{1} >C, 
where o = Rs is the real part of s and t = Ss is the imaginary part. The ¢- 
function has a singularity at s = 1, called a simple pole, where it goes to infinity like 
1/(1—s), and is equal to zero at the negative even integers s = —2, —4, ..., —2N,.... 
These zeros are called the trivial zeros of the ¢-function. Riemann (1859) made the 
following conjecture. 


Hypothesis 4.23 (Riemann hypothesis). Except for the trivial zeros, the only 
zeros of the Riemann ¢-function occur on the line Rs = 1/2. 


If true, the Riemann hypothesis has significant consequences for the distribution 
of primes (and many other things); roughly speaking, it implies that the prime 
numbers are “randomly distributed” among the natural numbers (with density 
1/logn near a large integer n € N). Despite enormous efforts, this conjecture has 
neither been proved nor disproved, and it remains one of the most significant open 
problems in mathematics (perhaps the most significant open problem). 


4.6. The ratio and root tests 


In this section, we describe the ratio and root tests, which provide explicit sufficient 
conditions for the absolute convergence of a series that can be compared with a 
geometric series. These tests are particularly useful in studying power series, but 
they aren’t effective in determining the convergence or divergence of series whose 
terms do not approach zero at a geometric rate. 


Theorem 4.24 (Ratio test). Suppose that (an) is a sequence of nonzero real num- 

bers such that the limit 

An+1 
an 


r= lim 
noo 


exists or diverges to infinity. Then the series 
co 
dan 
n=1 
converges absolutely if 0 < r < 1 and diverges if 1 < r < oo. 
Proof. If r < 1, choose s such that r < s < 1. Then there exists N € N such that 


An+1 
an 


<s for alln > N. 


It follows that 
lan| < Ms” for alln > N 


where M is a suitable constant. Therefore ` an converges absolutely by comparison 
with the convergent geometric series X` Ms”. 


If r > 1, choose s such that r > s > 1. There exists N € N such that 


An+1 
an 


>s for aln > N, 


70 4. Series 


so that |an| > Ms” for all n > N and some M > 0. It follows that (an) does not 
approach 0 as n — co, so the series diverges. 


Example 4.25. Let a € R, and consider the series 


foe) 
X na” = a+ 2a" + 3a +.... 


n=1 
Then ia 
1)a” 1 
lim da el = |a| lim (1 + ) = |al. 
noo nar n= n 


By the ratio test, the series converges if |a| < 1 and diverges if |a| > 1; the series 
also diverges if |a| = 1. The convergence of the series for |a| < 1 is explained by 
the fact that the geometric decay of the factor a” is more rapid than the algebraic 
growth of the coefficient n. 


Example 4.26. Let p > 0 and consider the p-series 


X 1 
nP 
n=1 
Then 
P 
wa T im olr 


so the ratio test is inconclusive. In this case, the series diverges if 0 < p < 1 and 
converges if p > 1, which shows that either possibility may occur when the limit in 
the ratio test is 1. 


The root test provides a criterion for convergence of a series that is closely re- 
lated to the ratio test, but it doesn’t require that the limit of the ratios of successive 
terms exists. 


Theorem 4.27 (Root test). Suppose that (a,,) is a sequence of real numbers and 
let 
r = lim sup an|” À 
n— oo 


Then the series 
Co 
Soan 
n=1 
converges absolutely if 0 < r < 1 and diverges if 1 < r < oo. 


Proof. First suppose 0 < r < 1. If 0 < r < 1, choose s such that r < s < 1, and 
let 


r 
t=-, T£t<i 
8 


If r = 0, choose any 0 < t < 1. Since t > limsup|a,|!/", Theorem implies 
that there exists N € N such that 


[et (oe for all n > N. 


Therefore |an| < t” for all n > N, where t < 1, so it follows that the series converges 
by comparison with the convergent geometric series )> t”. 


4.7. Alternating series 71 


Next suppose 1 < r < œ. If 1 < r < œ, choose s such that 1 < s < r, and let 
t= D L<t<r 
s 


If r = œ, choose any 1 < t < oo. Since t < limsup|a,|!/”, Theorem [3.41] implies 
that 

lag >t for infinitely many n € N. 
Therefore |a,| > t” for infinitely many n € N, where t > 1, so (a,) does not 
approach zero as n — oo, and the series diverges. 


The root test may succeed where the ratio test fails. 


Example 4.28. Consider the geometric series with ratio 1/2, 


3 1 rn 1 r 1 4 1 n 1 N 1 
ün = hike = —. 
a 27 92° 93° 947 95 gn 
Then (of course) both the ratio and root test imply convergence since 
n N 1 
lim |“"+1] = lim sup |ay|?/" = = <1. 
n> j| An noo 2 


Now consider the series obtained by switching successive odd and even terms 


ee ee p — J 1/2"*! ifn is odd, 
L&T" R2 2 A B 8 O (1/21 ifn is even 


For this series, 

bn+1 
bn 

and the ratio test doesn’t apply, since the required limit does not exist. (The series 

still converges at a geometric rate, however, because the the decrease in the terms 

by a factor of 1/8 for even n dominates the increase by a factor of 2 for odd n.) On 

the other hand 


2 if n is odd, 
1/8 ifn is even, 


1 
lim sup |bn|"" = =, 
n— o0 2 


so the ratio test still works. In fact, as we discuss in Section [4.8] since the series is 
absolutely convergent, every rearrangement of it converges to the same sum. 


4.7. Alternating series 


An alternating series is one in which successive terms have opposite signs. If the 
terms in an alternating series have decreasing absolute values and converge to zero, 
then the series converges however slowly its terms approach zero. This allows us to 
prove the convergence of some series which aren’t absolutely convergent. 


Example 4.29. The alternating harmonic series from Example is 


S (iH 1 1 1 1 
=1 
2 n 273 rae: 


n=l 
The behavior of its partial sums is shown in Figure [I] which illustrates the idea of 
the convergence proof for alternating series. 


72 4. Series 


1.2 T T T T T T T 


Oo 
0.8} 6 | 


O° 
POO CG GOO 6 
O06 6.6 6.6 
—senvec oe eee ewes 
o 


a 0.6; © 4 


0.2 F 3 


Figure 1. A plot of the first 40 partial sums Sn of the alternating harmonic 
series in Example [4.14] The odd partial sums decrease and the even partial 
sums increase to the sum of the series log2 œ~ 0.6931, which is indicated by 
the dashed line. 


Theorem 4.30 (Alternating series). Suppose that (an) is a decreasing sequence 
of nonnegative real numbers, meaning that 0 < an+ı < an, such that an — 0 as 
n — oo. Then the alternating series 


Co 
So (-1)"* tan =a, — Q2 + a3 — Q4 + 45>... 
n=1 

converges. 

Proof. Let 


n 


Sn = So (-1)** 14% 
k=1 
denote the nth partial sum. If n = 2m — 1 is odd, then 
Som—1 = Sam—3 — d2m—2 + dam—1 < S2m-3, 
since (an) is decreasing, and 
Som—1 = (a1 — G2) + (ag — a4) +--+ + (a2m-3 — a2m—2) + Gam-1 = 0. 


Thus, the sequence (S2m—1) of odd partial sums is decreasing and bounded from 
below by 0, so Sam_—1 | St as m — œ for some S+ > 0. 


Similarly, if n = 2m is even, then 


Sam = Sam—2 + dam—1 — @2m 2 Sam—2; 


4.8. Rearrangements 73 


and 


Som = G1 — (a2 — a3) — (a4 — a5) — +++ — (@am—1 — Gam) < a1. 
Thus, (Sm) is increasing and bounded from above by a1, so Sam Î ST < ay, as 
m — oo. 


Finally, note that 


lim (Som—1 — Som) = lim dam = 0, 
m— oo m— oo 


so S* = S~, which implies that the series converges to their common value. 


The proof also shows that the sum Som < S < S2,_1 is bounded from below 
and above by all even and odd partial sums, respectively, and that the error |S,,—S 
is less than the first term an+ı in the series that is neglected. 


Example 4.31. The alternating p-series 


(cy 


converges for every p > 0. The convergence is absolute for p > 1 and conditional 
for0<p<1. 


4.8. Rearrangements 


A rearrangement of a series is a series that consists of the same terms in a different 
order. The convergence of rearranged series may initially appear to be unconnected 
with absolute convergence, but absolutely convergent series are exactly those series 
whose sums remain the same under every rearrangement of their terms. On the 
other hand, a conditionally convergent series can be rearranged to give any sum we 
please, or to diverge. 


Example 4.32. A rearrangement of the alternating harmonic series in Exam- 


ple [4.14] is 
oe re ee 1 1 1 1 


2 4°36 8°56 10 p 

where we put two negative even terms between each of the positive odd terms. The 
behavior of its partial sums is shown in Figure [2] As proved in Example 
this series converges to one-half of the sum of the alternating harmonic series. The 
sum of the alternating harmonic series can change under rearrangement because it 
is conditionally convergent. 


1 


Note also that both the positive and negative parts of the alternating harmonic 
series diverge to infinity, since 


ee eee en ee E a E 
3 5'7 2°4 
>i (tgtgtat 
2 2°3°4 
eG ee -30 EEE ) 
2'4 6 8 2 ee a U? 


and the harmonic series diverges. This is what allows us to change the sum by 
rearranging the series. 


74 4. Series 


1.2 T T T T T T T 


0.8 F 4 


0.2 F | 


Figure 2. A plot of the first 40 partial sums S» of the rearranged alternating 
harmonic series in Example The series converges to half the sum of the 
alternating harmonic series, 5 log 2 ~ 0.3466. Compare this picture with Fig- 


ure 


The formal definition of a rearrangement is as follows. 
Definition 4.33. A series 
2 bm 
m=1 


is a rearrangement of a series 
Co 
d an 
n=1 
if there is a one-to-one, onto function f : N —> N such that bm = afm). 


If X` bm is a rearrangement of 5° a, with n = f(m), then > an is a rearrange- 
ment of X` bm, with m = f~1(n). 


Theorem 4.34. If a series is absolutely convergent, then every rearrangement of 
the series converges to the same sum. 


Proof. First, suppose that 


oo 
2an 
n=1 


4.8. Rearrangements 75 


is a convergent series with an > 0, and let 


D bm bm = afm) 
m=1 


be a rearrangement. 
Given € > 0, choose N € N such that 


fore) N 
0< > ar- > % <e. 
k=1 k=1 


Since f : N > N is one-to-one and onto, there exists M € N such that 
{1,2,...,N} Cc fo ({1,2,..., MY), 


meaning that all of the terms aj, az2,..., ay are included among the b1, b2,..., bm- 
For example, we can take M = max{m € N: 1 < f(m) < N}; this maximum is 
well-defined since there are finitely many such m (in fact, N of them). 


If m > M, then 


since the 6,;’s include all the ax’s in the left sum, all the 6;’s are included among 
the a,’s in the right sum, and ax, b; > 0. It follows that 


co m 
0<S oa —S b; <€, 
k=l j=l 


for all m > M, which proves that 
Yb = Soa 
j=l o=1 


If X` an is a general absolutely convergent series, then from Proposition [4.17] 
the positive and negative parts of the series 


oo oo 
+ my 

X an, ün 

n=1 n=1 


converge. If X` bm is a rearrangement of X` an, then X` b}, and J` bp, are rearrange- 
ments of $` a} and J` a}, respectively. It follows from what we’ve just proved that 
they converge and 


co co co co 
eee. o 
m=1 n=1 
Proposition then implies that ` bm is absolutely convergent and 


oo oo oo ee) ee) oo 
do bm = J bm- 2 bm = Lian — dan =D 0s 
m=1 m=1 n=1 n=1 


m=1 n=1 


which proves the result. 


76 4. Series 


0 5 10 15 20 25 30 35 40 


Figure 3. A plot of the first 40 partial sums S» of the rearranged alternating 
harmonic series described in Example which converges to V2. 


Conditionally convergent series behave completely differently from absolutely 
convergent series under rearrangement. As Riemann observed, they can be rear- 
ranged to give any sum we want, or to diverge. Before giving the proof, we illustrate 
the idea with an example. 


Example 4.35. Suppose we want to rearrange the alternating harmonic series 


1 ee lad a 
2 3 4 5 6 ` 


so that its sum is V2 ~ 1.4142. We choose positive terms until we get a partial sum 
that is greater than V2, which gives 1+1/3-+1/5; followed by negative terms until 
we get a sum less than v2, which gives 1 + 1/3 + 1/5 — 1/2; followed by positive 
terms until we get a sum greater than v2, which gives 


re re oe eae es 
3 ae ae cee ee ee 


followed by another negative term —1/4 to get a sum less than v2; and so on. The 
first 40 partial sums of the resulting series are shown in Figure [3] 


Theorem 4.36. If a series is conditionally convergent, then it has rearrangements 
that converge to an arbitrary real number and rearrangements that diverge to oo 
or —Ooo. 


Proof. Suppose that X` an is conditionally convergent. Since the series converges, 
an — 0 as n > oo. If both the positive part )> af and negative part J` a; of the 


4.9. The Cauchy product 77 


series converge, then the series converges absolutely; and if only one part diverges, 
then the series diverges (to oo if X` a} diverges, or —oo if $` a; diverges). Therefore 
both X` a} and $` a, diverge. This means that we can make sums of successive 
positive or negative terms in the series as large as we wish. 


Suppose S € R. Starting from the beginning of the series, we choose successive 
positive or zero terms in the series until their partial sum is greater than or equal 
to S. Then we choose successive strictly negative terms, starting again from the 
beginning of the series, until the partial sum of all the terms is strictly less than 
S. After that, we choose successive positive or zero terms until the partial sum is 
greater than or equal S, followed by negative terms until the partial sum is strictly 
less than S, and so on. The partial sums are greater than © by at most the value 
of the last positive term retained, and are less than S by at most the value of the 
last negative term retained. Since an > 0 as n > ov, it follows that the rearranged 
series converges to S. 

A similar argument shows that we can rearrange a conditional convergent series 
to diverge to œ or —oo, and that we can rearrange the series so that it diverges in 
a finite or infinite oscillatory fashion. 


The previous results indicate that conditionally convergent series behave in 
many ways more like divergent series than absolutely convergent series. 


4.9. The Cauchy product 


In this section, we prove a result about the product of absolutely convergent series 
that is useful in multiplying power series. It is convenient to begin numbering the 
terms of the series at n = 0. 


Definition 4.37. The Cauchy product of the series 


oo oo 
> an, > bn 
n=0 n=0 


5 ( abha) i 
n=0 \k=0 


The Cauchy product arises formally by term-by-term multiplication and re- 
arrangement: 


(ao +a, + a2 +a3+...) (bo + bi +b2 +b3 +...) 
= aobo + aobı + aob2 + agb3 +--+ + aibo + a1bı + aibo +... 
+ agbp + a2b1 +++: + agbo +... 


= aobo H (aobı t aybo) f (aob2 a by t abo) 


is the series 


(aob3 t a,b } ab; T agbo) +.... 


In general, writing m = n — k, we have formally that 


k=0 m=0 n=0 k=0 


78 4. Series 


There are no convergence issues about the individual terms in the Cauchy product, 
since J po dxbn—x is a finite sum. 


Theorem 4.38 (Cauchy product). If the series 


are absolutely convergent, then the Cauchy product is absolutely convergent and 
S (Dana) = (Soa) (Som). 
n=0 \k=0 n=0 n=0 


Proof. For every N € N, we have 


«(Sm (354) 


(Se) (Em) 


Thus, the Cauchy product is absolutely convergent, since the partial sums of its 
absolute values are bounded from above. 


IA 


Since the series for the Cauchy product is absolutely convergent, any rearrange- 
ment of it converges to the same sum. In particular, the subsequence of partial sums 
given by 


corresponds to a rearrangement of the Cauchy product, so 
o0 n N N le) ee) 
$ (Saver) = im, (Som) (Som) = (Seem) (Som). 
n=0 \k=0 n=0 n=0 n=0 n=0 


In fact, as we discuss in the next section, since the series of term-by-term 
products of absolutely convergent series converges absolutely, every rearrangement 
of the product series — not just the one in the Cauchy product — converges to the 
product of the sums. 


4.10. * Double series 


A double series is a series of the form 


foe) 
X amn, 


m n=l 


4.10. * Double series 79 


where the terms amn are indexed by a pair of natural numbers m,n € R. More 
formally, the terms are defined by a function f : N x N —> R where amn = f (m,n). 
In this section, we consider sums of double series; this material is not used later on. 


There are many different ways to define the sum of a double series since, unlike 
the index set N of a series, the index set N x N of a double series does not come 
with a natural linear order. Our main interest here is in giving a definition of the 
sum of a double series that does not depend on the order of its terms. As for series, 
this unordered sum exists if and only if the double series is absolutely convergent. 


If F CN xN is a finite subset of pairs of natural numbers, then we denote by 
5 Amn = 5 Amn 
F (m,n)EF 
the partial sum of all terms amn whose indices (m, n) belong to F. 


Definition 4.39. The unordered sum of nonnegative real numbers amn > 0 is 


NxN FEF UF 
where the supremum is taken over the collection F of all finite subsets F CN x N. 
The unordered sum converges if this supremum is finite and diverges to oo if this 


supremum is 00. 


In other words, the unordered sum of a double series of nonnegative terms is 
the supremum of the set of all finite partial sums of the series. Note that this 
supremum exists if and only if the finite partial sums of the series are bounded 
from above. 


Example 4.40. The unordered sum 


= + F i + Ess 
(m x (m+n) (1+1 (1+2 (2+1) (1 +3)P (2+2) 


converges if p > 2 and diverges if p < 2. To see this, first note that if 
T={(m,n)ENxXN:2<m+n< N} 


is a “triangular” set of indices, then 


It follows that 


80 4. Series 


for every N € N, so the double series diverges if p < 2 since the (p — 1)-series 
diverges. Moreover, if p > 2 and F is a finite subset of N x N, then there exists a 
triangular set T such that F C T, so that 


1 1 < 1 
bry SL tap Oe 


F 


It follows that the unordered sum converges if p > 2, with 


1 1 
Dew e S ae 


NxN k=2 


Note that this double p-series converges only if its terms approach zero at a faster 
rate than the terms in a single p-series (of degree greater than 2 in (m,n) rather 
than degree greater than 1 in n). 


We define a general unordered sum of real numbers by summing its positive and 
negative terms separately. (Recall the notation in Definition for the positive 
and negative parts of a real number.) 


Definition 4.41. The unordered sum of a double series of real numbers is 


X amn = X ahn — X amn 


NxN NxN NxN 


where amn = an — Amn, is the decomposition of amn into its positive and negative 
parts. The unordered sum converges if both $` af, and $` an are finite; diverges 
to oo if J a}, = œ and J` amn is finite; diverges to —oo if J ah, is finite and 


Y Gyn = œ; and is undefined if both X` a} „ and }>a;,,, diverge to oo. 


mn mn 


This definition does not require us to order the index set N x N in any way; in 
fact, the same definition applies to any series of real numbers 


2a 
wel 


whose terms are indexed by an arbitrary set J. A sum over a set of indices will 
always denote an unordered sum. 


Definition 4.42. A double series 


of real numbers converges absolutely if the unordered sum 


5 lamn| < œ 


NxN 


converges. 


The following result is a straightforward consequence of the definitions and the 
completeness of R. 


4.10. * Double series 81 


Proposition 4.43. An unordered sum ` amn of real numbers converges if and 
only if it converges absolutely, and in that case 


5 |Gmn| = 5 Ohn + Jo amn: 


NxN NxN NxN 


Proof. First, suppose that the unordered sum 
damn 
NxN 
converges absolutely. If F C N x N is any finite subset, then 


F F 


NxN 


It follows that the finite partial sums of the positive and negative terms of the series 
are bounded from above, so the unordered sum converges. 


Conversely, suppose that the unordered sum converges. If F C N x N is any 
finite subset, then 


= + a 
) lamn| T X ahn + X Amn < X amn A X ümn: 
F F F NxN NxN 


It follows that the finite partial sums of the absolute values of the terms in the series 
are bounded from above, so the unordered sum converges absolutely. Furthermore, 


DD lamn] < 5 afn + 5 amn: 


NxN NxN NxN 


To prove the inequality in the other direction, let € > 0. There exist finite sets 
F}, F— C N x N such that 


Yao Os Gnas Nee tery 
Fy F 


NxN NxN 


Let F = F} U F_. Then, since a=, > 0, we have 


mn 


2 lamn| = J amn + D7 amn 
F F 


F 


Fy F_ 
> > afn + > Amn — € 
NxN NxN 


Since € > 0 is arbitrary, it follows that 


5 [amn] 2 Xe ahn aE Xo amn? 


NxN NxN NxN 


which completes the proof. 


Next, we define rearrangements of a double series into single series and show 
that every rearrangement of a convergent unordered sum converges to the same 
sum. 


82 4. Series 


Definition 4.44. A rearrangement of a double series 


is a series of the form 
loc) 
5 bk, bk = ao(k) 
k=1 


where o : N > N x N is a one-to-one, onto map. 


Example 4.45. The rearrangement corresponding to the map f:N > NxN 
defined in the proof of Proposition is given by 


co 
X Amn = Q11 + Q21 + Q12 + Q31 + Q22 + Q13 + G41 + a32 + a23 + Q14 +... 


m,n=l1 


Theorem 4.46. If the unordered sum of a double series of real numbers converges, 
then every rearrangement of the double series into a single series converges to the 
unordered sum. 


Proof. Suppose that the unordered sum ` amn converges with 
XO ahn = S4, X amn = S-, 
NxN NxN 


and let 
Sa 
k=1 
be a rearrangement of the double series corresponding to a map o : N > N x N. 
For k € N, let 
Fy = {(m,n) E N x N: (m,n) = o(7) for some 1 < j < k}, 
so that 
Dee e 
j=1 Fr Fk Fy, 
Given e€ > 0, choose finite sets 
F,,F CNxN 


such that 


S,-e<> ahn < S+ S- — e< X amn < S-, 
Fy F_ 


and define N € N by 
N = max{j EN: o(j) e FUF}. 
If k > N, then Fk D F} U F_ and, since af „ > 0, 


mn — 


X afn < Y ahn < S+, X amn =) amn < S- 
Fy Fy F_ Fy 


4.10. * Double series 83 


It follows that 


Sete ane Bee Gan So 
Fr Fy 


which implies that 
k 
Sob; = (S4 — S_) KE: 
j=1 


This inequality proves that the rearrangement ` bẹ converges to the unordered 
sum S4 — S_ of the double series. 


The rearrangement of a double series into a single series is one natural way to 
interpret a double series in terms of single series. Another way is to use iterated 
sums of single series. 


Given a double series > amn, one can define two iterated sums, obtained by 
summing first over one index followed by the other: 


oo oo M N 
E (Deme) = gin, D (i Dem): 


m=1 \n m=1 n=1 
lee) lee) N M 
J a = lim ) lim J a 
ve N-o0o M-0o as 
n=1 \m=1 n=1 m= 


As the following example shows, these iterated sums may not equal, even if both 
of them converge. 


Example 4.47. Define amn by 


1 ifn=m+1 


Qmn = 4—1 ifm=n+1 


0 otherwise 


Then, by writing out the terms amn in a table, one can see that 


= 1 ifm=1 = -1 ifn=1 

X Amn = b ’ X Amn = A ’ 
0 otherwise F 0 otherwise 

so that 


Note that for this series 


5 [amn] =O, 


NxN 
so it is not absolutely convergent. Furthermore, both of the sums 
+ a 
2 amm Ds Amn 
NxN NxN 


diverge to oo, so the unordered sum is not well-defined. 


84 4. Series 


The following basic result, which is a special case of Fubini’s theorem, guaran- 
tees that both of the iterated sums of an absolutely convergent double series exist 
and are equal to the unordered sum. It also gives a criterion for the absolute con- 
vergence of a double series in terms of the convergence of its iterated sums. The 
key point of the proof is that a double sum over a “rectangular” set of indices is 
equal to an iterated sum. We can then estimate sums over arbitrary finite subsets 
of indices in terms of sums over rectangles. 


Theorem 4.48 (Fubini for double series). A double series of real numbers X` amn 
converges absolutely if and only if either one of the iterated series 


D 35 S) 


m=1 \n=1 n=1 \m= 


converges. In that case, both iterated series converge to the unordered sum of the 
double series: 


Ew EE) £E~) 


NxN m=1 n=1 \m=1 


Proof. First suppose that one of the iterated sums of the absolute values exists. 
Without loss of generality, we suppose that 


D (È 2) < 00, 


m=1 \n=1 


Let F C N xN be a finite subset. Choose M,N e N such that m < M and 
n < N for all (m,n) € F. Then F C R where the rectangle R is given by 
R= {1,2,...,M} x {1,2,...,N}, so that 


E lona! < È lem = 5 > 2) < 2 (Eem) 


m=1 \n=1 
Thus, the finite partial sums of ` |@mn| are bounded from above, so the unordered 
sum converges absolutely. 


Conversely, suppose that the unordered sum converges absolutely. Then, using 
Proposition and the fact that the supremum of partial sums of non-negative 
terms over rectangles in N x N is equal to the supremum over all finite subsets, we 


get that 
E(Em)-mnlEely 


m=1 \n=1 m=1 n=1 
M N 
= sup X X lamn| 
(M,N)ENxN melre] 


= 2 lamn. 


NxN 


4.10. * Double series 85 


Thus, the iterated sums converge to the unordered sum. Moreover, we have simi- 
larly that 


which PETA that 
TAEAE AT 


m=1 \n=1 m=1 \n=1 


The preceding results show that the sum of an absolutely convergent double 
series is unambiguous; unordered sums, sums of rearrangements into single series, 
and iterated sums all converge to the same value. On the other hand, the sum 
of a conditionally convergent double series depends on how it is defined, e.g., on 
how one chooses to rearrange the double series into a single series. We conclude by 
describing one other way to define double sums. 


Example 4.49. A natural way to generalize the sum of a single series to a double 
series, going back to Cauchy (1821) and Pringsheim (1897), is to say that 


oo 
) GS: 
m,n=1 


if for every e€ > 0 there exists M, N € N such that 


Symes <E 


i=1 j=1 


for all m > M and n > N. We write this definition briefly as 


oœ M N 
S am = He [D Dm: 
That is, we sum over larger and larger “rectangles” of indices in the N x N-plane. 


An absolutely convergent series converges in this sense to the same sum, but 
some conditionally convergent series also converge. For example, using this defini- 
tion of the sum, we have 


2 (ye 


l M N (-1ymtn 
De aes = ps 2 ert 


ll 
Sn 
Ze 
Me 

pi 
Se 

3 
z5 
ge 
Me 

T 
3| = 
ends 


II 
a | 
Ma 
1 
E 
es | 


86 4. Series 


but the series is not absolutely convergent, since the sums of both its positive and 
negative terms diverge to oo. 


This definition of a sum is not, however, as easy to use as the unordered sum of 
an absolutely convergent series. For example, it does not satisfy Fubini’s theorem 
(although one can show that if the sum of a double series exists in this sense and 
the iterated sums also exist, then they are equal [16]). 


4.11. * The irrationality of e 


In this section, we use series to prove that e is an irrational number. In Proposi- 
tion|3.32| we defined e ~ 2.71828... as the limit 


1 n 
e= lim (1 + 2) ‘ 
noo n 


We first obtain an alternative expression for e as the sum of a series. 


Proposition 4.50. 


Proof. Using the binomial theorem, as in the proof of Proposition we find 


that 
1\" 1 1 1 1 2 
(45) =245(1-5) +a (1-5) (1-4) 


1 1 2 k-1 
ae (1 )( )- (6-5) 
k! n n n 
1 1 2 2 1 
++ 1 1 piae 
n! n n n n 
1 1 1 1 
A a Pe 
Taking the limit of this inequality as n — oo, we get that 
S I 
e< — 
= n! 
n=0 


To get the reverse inequality, we observe that for every 2< k <n, 
TO 1 1 1 1 2 
=j 224 1 
I oa i 1 1 2 1 k-1 
k! n nj n i 


Fixing k and taking the limit as n + oo, we get that 


4.11. * The irrationality of e 87 


Then, taking the limit as k — oo, we find that 


xX 1 
e> do) 
n=0 


which proves the result. 


This series for e is very rapidly convergent. The next proposition gives an 
explicit error estimate. 


Proposition 4.51. For every n € N, 


Proof. The lower bound is obvious. For the upper bound, we estimate the tail of 
the series by a geometric series: 


2? 1 S 1 
age ok 
k=0 k=n+1 
wi fT. 1 | 1 
nl (= ETES oon aem) 
< ai( a ee eer ae ) 
m!\n+1  (n+1)? (n+1)F ` 
1 1 
non 


Theorem 4.52. The number e is irrational. 


Proof. Suppose for contradiction that e = p/q for some p,q € N. From Proposi- 


tion 


p-n! Zon! 1 


The middle term is an integer if n > q, which is impossible since it lies strictly 
between 0 and 1/n. 


A real number is said to be algebraic if it is a root of a polynomial with integer 
coefficients, otherwise it is transcendental. For example, every rational number 
x = p/q, where p,q € Z and q Æ 0, is algebraic, since it is the solution of qz — p = 0; 
and V2 is an irrational algebraic number, since it is a solution of xv? —2=0. It’s 
possible to prove that e is not only irrational but transcendental, but the proof is 
harder. Two other explicit examples of transcendental numbers are 7 and 2V2, 


88 4. Series 


There is a long history of the study of irrational and transcendental numbers. 
Euler (1737) proved the irrationality of e, and Lambert (1761) proved the irrational- 
ity of m. The first proof of the existence of a transcendental number was given by 
Liouville (1844), who showed that 


Co 


1 
io 0.11000100000000000000000100... 
n=1 
is transcendental. The transcendence of e was proved by Hermite (1873), the tran- 
scendence of + by Lindemann (1882), and the transcendence of 2Y? independently 


by Gelfond and Schneider (1934). 


Cantor (1878) observed that the set of algebraic numbers is countably infinite, 
since there are countable many polynomials with integer coefficients, each such 
polynomial has finitely many roots, and the countable union of countable sets is 
countable. This argument proved the existence of uncountably many transcen- 
dental numbers without exhibiting any explicit examples, which was a remarkable 
result given the difficulties mathematicians had encountered (and still encounter) 
in proving that specific numbers are transcendental. 


There remain many unsolved problems in this area. For example, it’s not known 


if the Euler constant 
. “1 
T= ts (>: eo n) 


k=1 
from Example is rational or irrational. 


| 
Chapter 5 


Topology of the Real 
Numbers 


In this chapter, we define some topological properties of the real numbers R and 
its subsets. 


5.1. Open sets 


Open sets are among the most important subsets of R. A collection of open sets 
is called a topology, and any property (such as convergence, compactness, or con- 
tinuity) that can be defined entirely in terms of open sets is called a topological 
property. 


Definition 5.1. A set G C R is open if for every x € G there exists a 6 > 0 such 
that G D (x — ô, x + ô). 


The entire set of real numbers R is obviously open, and the empty set Ø is 
open since it satisfies the definition vacuously (there is no x € Ø). 


Example 5.2. The open interval J = (0,1) is open. If x € J, then 


I> (x—-5,x4+56), j= min (2,172) > 0. 


Similarly, every finite or infinite open interval (a,b), (—00, b), or (a,co) is open. 


Example 5.3. The half-open interval J = (0,1] isn’t open, since 1 € J and 
(1 — 6,14 ô) isn’t a subset of J for any ô > 0, however small. 


The next proposition states a characteristic property of open sets. 


Proposition 5.4. An arbitrary union of open sets is open, and a finite intersection 
of open sets is open. 


90 5. Topology of the Real Numbers 


Proof. Suppose that {A; C R : i € I} is an arbitrary collection of open sets. If 
x € Ue, Ai, then x € A; for some i € I. Since A; is open, there is ô > 0 such that 
A; D (x — ð, x + ô), and therefore 

J Ai D (a — 6,2 +), 

i€l 
which proves that (J;e; Ai is open. 


Suppose that {A; C R : i = 1,2,...,n} is a finite collection of open sets. If 
x € Ni Ai, then x € A; for every 1 <i < n. Since A; is open, there is 6; > 0 
such that A; D (a — ði, £ + ĉi). Let 
ô = min{d1, da, sey on} > 0. 
Then we see that rn 
() Ai D(z- 46,2 +6), 
i=1 
which proves that (}j_, A; is open. 


The previous proof fails for an infinite intersection of open sets, since we may 
have 6; > 0 for every i € N but inf{ð; : i € N} = 0. 


Example 5.5. The interval 


is open for every n € N, but 


is not open. 


In fact, every open set in R is a countable union of disjoint open intervals, but 
we won’t prove it here. 


5.1.1. Neighborhoods. Next, we introduce the notion of the neighborhood of a 
point, which often gives clearer, but equivalent, descriptions of topological concepts 
than ones that use open intervals. 


Definition 5.6. A set U C R is a neighborhood of a point x € R if 
UD(x—ô,x +ô) 
for some 6 > 0. The open interval (x — ô, x + ô) is called a ô-neighborhood of x. 
A neighborhood of x needn’t be an open interval about x, it just has to contain 


one. Some people require than a neighborhood is also an open set, but we don’t; 
we’ll specify that a neighborhood is open if it’s needed. 


Example 5.7. If a < x < b, then the closed interval [a,b] is a neighborhood of x, 
since it contains the interval (x — ô, x + ô) for sufficiently small ô > 0. On the other 
hand, [a,b] is not a neighborhood of the endpoints a, b since no open interval about 
a or b is contained in [a,b]. 


We can restate the definition of open sets in terms of neighborhoods as follows. 


5.1. Open sets 91 


Definition 5.8. A set G C R is open if every x € G has a neighborhood U such 
that GDU. 


In particular, an open set is itself a neighborhood of each of its points. 


We can restate Definition for the limit of a sequence in terms of neighbor- 
hoods as follows. 


Proposition 5.9. A sequence (£n) of real numbers converges to a limit x € R if 
and only if for every neighborhood U of x there exists N € N such that x, € U for 
alln >N. 


Proof. First suppose the condition in the proposition holds. Given e > 0, let 
U = (x — e,x + €) be an e-neighborhood of x. Then there exists N € N such that 
Zn E€ U for all n > N, which means that |v, — x| < e. Thus, £n + x as n > oo. 
Conversely, suppose that £n —> x as n —> ov, and let U be a neighborhood 
of x. Then there exists € > 0 such that U D (x — €,a + €). Choose N € N such 
that |x, — z| < e for all n > N. Then zn € U for all n > N, which proves the 
condition. 


5.1.2. Relatively open sets. We define relatively open sets by restricting open 
sets in R to a subset. 


Definition 5.10. If A C R then B C A is relatively open in A, or open in A, if 
B = ANG where G is open in R. 


Example 5.11. Let A = [0,1]. Then the half-open intervals (a,1] and [0,b) are 
open in A for every 0 <a < 1 and 0 < b < 1, since 

(a, 1] = [0, 1] A (a, 2), [0, 6) = [0,1] A (—1, b) 
and (a, 2), (—1,6) are open in R. By contrast, neither (a, 1] nor [0, 6) is open in R. 


The neighborhood definition of open sets generalizes to relatively open sets. 
First, we define relative neighborhoods in the obvious way. 


Definition 5.12. If A C R then a relative neighborhood in A of a point x € A is 
a set V = ANU where U is a neighborhood of z in R. 


As we show next, a set is relatively open if and only if it contains a relative 
neighborhood of every point. 


Proposition 5.13. A set B C A is relatively open in A if and only if every x € B 
has a relative neighborhood V in A such that B D V. 


Proof. Assume that B is open in A. Then B = ANG where G is open in R. If 
x € B, then x € G, and since G is open, there is a neighborhood U of x in R such 
that GD U. Then V = ANU is a relative neighborhood of x with BD V. 

Conversely, assume that every point x € B has a relative neighborhood V; = 
ANU, in A such that Vy C B, where Ux is a neighborhood of x in R. Since Uy is 
a neighborhood of x, it contains an open neighborhood Gs C Uz. We claim that 
that B = ANG where 

G= |] Ga. 


xEB 


92 5. Topology of the Real Numbers 


It then follows that G is open, since it’s a union of open sets, and therefore B = ANG 
is relatively open in A. 

To prove the claim, we show that B C ANG and BD ANG. First, BC ANG 
since x E€ ANG, C ANG for every x € B. Second, ANG, C ANU, C B for every 
x € B. Taking the union over x € B, we get that ANG C B. 


5.2. Closed sets 
Sets are not doors. (Attributed to James Munkres.) 

Closed sets are defined topologically as complements of open sets. 
Definition 5.14. A set F C R is closed if F° = {x € R : x ¢ F} is open. 
Example 5.15. The closed interval I = [0, 1] is closed since 

I° = (—œ, 0) U (1, 00) 


is a union of open intervals, and therefore it’s open. Similarly, every finite or infinite 
closed interval [a,b], (—0oo, b], or [a, co) is closed. 


The empty set @ and R are both open and closed; they’re the only such sets. 
Most subsets of R are neither open nor closed (so, unlike doors, “not open” doesn’t 
mean “closed” and “not closed” doesn’t mean “open” ). 


Example 5.16. The half-open interval J = (0,1] isn’t open because it doesn’t 
contain any neighborhood of the right endpoint 1 € J. Its complement 


I° = (00, 0] U (1, 00) 


isn’t open either, since it doesn’t contain any neighborhood of 0 € I°. Thus, I isn’t 
closed either. 


Example 5.17. The set of rational numbers Q C R is neither open nor closed. 
It isn’t open because every neighborhood of a rational number contains irrational 
numbers, and its complement isn’t open because every neighborhood of an irrational 
number contains rational numbers. 


Closed sets can also be characterized in terms of sequences. 


Proposition 5.18. A set F C R is closed if and only if the limit of every convergent 
sequence in F belongs to F. 


Proof. First suppose that F is closed and (x,,) is a convergent sequence of points 
tn E F such that zn —> x. Then every neighborhood of x contains points £n E F. 
It follows that x ¢ F°, since F° is open and every y € F° has a neighborhood 
U C F° that contains no points in F. Therefore, x € F. 


Conversely, suppose that the limit of every convergent sequence of points in F 
belongs to F. Let x € F°. Then x must have a neighborhood U C F°; otherwise for 
every n € N there exists x, E€ F such that £n € (x — 1/n,x + 1/n), so x = lim £n, 
and x is the limit of a sequence in F. Thus, F° is open and F is closed. 


5.2. Closed sets 93 


Example 5.19. To verify that the closed interval [0,1] is closed from Proposi- 
tion suppose that (£n) is a convergent sequence in [0,1]. Then 0 < £n < 1 for 
all n € N, and since limits preserve (non-strict) inequalities, we have 


0< lim ztn <1, 


n— oo 


meaning that the limit belongs to [0,1]. On the other hand, the half-open interval 
I = (0, 1] isn’t closed since, for example, (1/n) is a convergent sequence in J whose 
limit 0 doesn’t belong to T. 


Closed sets have complementary properties to those of open sets stated in 
Proposition 


Proposition 5.20. An arbitrary intersection of closed sets is closed, and a finite 
union of closed sets is closed. 


Proof. If {F; : i € I} is an arbitrary collection of closed sets, then every F$ is 
open. By De Morgan’s laws in Proposition |1.23| we have 


c 

(Qa) -Us 
icI i€l 
which is open by Proposition Thus (),<, F; is closed. Similarly, the complement 
of a finite union of closed sets is open, since 


(i) “Ae 


i=1 


so a finite union of closed sets is closed. 


The union of infinitely many closed sets needn’t be closed. 


Example 5.21. If J, is the closed interval 


1 1 
In = |—; Las ’ 
n n 
then the union of the J, is an open interval 
U h = (0,1). 
n=1 


If A is a subset of R, it is useful to consider different ways in which a point 
x E€ R can belong to A or be “close” to A. 


Definition 5.22. Let A C R be a subset of R. Then z € R is: 


1) an interior point of A if there exists 6 > 0 such that A D (x — ô, x + ô); 


2) an isolated point of A if x € A and there exists ô > 0 such that x is the only 
point in A that belongs to the interval (a — ô, x + ô); 


3) a boundary point of A if for every 6 > 0 the interval (£x — 6,2 + 6) contains 
points in A and points not in A; 


4) an accumulation point of A if for every ô > 0 the interval (x— ô, £ +ô) contains 
a point in A that is distinct from z. 


94 5. Topology of the Real Numbers 


When the set A is understood from the context, we refer, for example, to an 
“interior point.” 

Interior and isolated points of a set belong to the set, whereas boundary and 
accumulation points may or may not belong to the set. In the definition of a 
boundary point x, we allow the possibility that x itself is a point in A belonging 
to (a — ð x + ô), but in the definition of an accumulation point, we consider only 
points in A belonging to (x — 6,2 + ô) that are distinct from x. Thus an isolated 
point is a boundary point, but it isn’t an accumulation point. Accumulation points 
are also called cluster points or limit points. 


We illustrate these definitions with a number of examples. 


Example 5.23. Let I = (a,b) be an open interval and J = [a,b] a closed interval. 
Then the set of interior points of J or J is (a,b), and the set of boundary points 
consists of the two endpoints {a,b}. The set of accumulation points of I or J is 
the closed interval [a,b] and I, J have no isolated points. Thus, I, J have the same 
interior, isolated, boundary and accumulation points, but J contains its boundary 
points and all of its accumulation points, while J does not. 


Example 5.24. Let a < c < b and suppose that 
A = (a,c) U (c,b) 
is an open interval punctured at c. Then the set of interior points is A, the set 


of boundary points is {a,b,c}, the set of accumulation points is the closed interval 
[a,b], and there are no isolated points. 


Example 5.25. Let 
1 
A= { Ine n} i 
n 


Then every point of A is an isolated point, since a sufficiently small interval about 
1/n doesn’t contain 1/m for any integer m # n, and A has no interior points. The 
set of boundary points of A is AU {0}. The point 0 ¢ A is the only accumulation 
point of A, since every open interval about 0 contains 1/n for sufficiently large n. 


Example 5.26. The set N of natural numbers has no interior or accumulation 
points. Every point of N is both a boundary point and an isolated point. 


Example 5.27. The set Q of rational numbers has no interior or isolated points, 
and every real number is both a boundary and accumulation point of Q. 


Example 5.28. The Cantor set C defined in Section below has no interior 
points and no isolated points. The set of accumulation points and the set of bound- 
ary points of C' is equal to C. 


The following proposition gives a sequential definition of an accumulation point. 


Proposition 5.29. A point x € R is an accumulation point of A C R if and only 
if there is a sequence (£n) in A with £n # x for every n € N such that £n —> x as 
n —> o. 


Proof. Suppose x € R is an accumulation point of A. Definition implies that 
for every n € N there exists x, € A \ {x} such that zn € (x — 1/n,x + 1/n). It 
follows that £n > x as n > ov. 


5.3. Compact sets 95 


Conversely, if x is the limit of a sequence (xn) in A with zn 4 x, and U isa 
neighborhood of x, then £n E U \ {x} for sufficiently large n € N, which proves 
that x is an accumulation point of A. 


Example 5.30. If 
1 
A= {+ INE n} ; 
n 


then 0 is an accumulation point of A, since (1/n) is a sequence in A such that 
1/n > 0 as n > co. On the other hand, 1 is not an accumulation point of A since 
the only sequences in A that converges to 1 are the ones whose terms eventually 
equal 1, and the terms are required to be distinct from 1. 


We can also characterize open and closed sets in terms of their interior and 
accumulation points. 


Proposition 5.31. A set A C R is: 


(1) open if and only if every point of A is an interior point; 
(2) closed if and only if every accumulation point belongs to A. 


Proof. If A is open, then it is an immediate consequence of the definitions that 
every point in A is an interior point. Conversely, if every point x € A is an interior 
point, then there is an open neighborhood Uy C A of x, so 
A=), 
ZEA 

is a union of open sets, and therefore A is open. 

If A is closed and x is an accumulation point, then Proposition and Propo- 
sition imply that x € A. Conversely, if every accumulation point of A belongs 


to A, then every x € A® has a neighborhood with no points in A, so A° is open and 
A is closed. 


5.3. Compact sets 


The significance of compact sets is not as immediately apparent as the significance 
of open sets, but the notion of compactness plays a central role in analysis. One 
indication of its importance already appears in the Bolzano-Weierstrass theorem 


(Theorem |3.57). 


Compact sets may be characterized in many different ways, and we will give 
the two most important definitions. One is based on sequences (every sequence has 
a convergent subsequence), and the other is based on open sets (every open cover 
has a finite subcover). 


We will prove that a subset of R is compact if and only if it is closed and 
bounded. For example, every closed, bounded interval fa, b] is compact. There are, 
however, many other compact subsets of R. In Section[5.5}we describe a particularly 
interesting example called the Cantor set. 

We emphasize that although the compact sets in R are exactly the closed and 
bounded sets, this isn’t their fundamental definition; rather it’s an explicit descrip- 
tion of what compact sets look like in R. In more general spaces than R, closed and 


96 5. Topology of the Real Numbers 


bounded sets need not be compact, and it’s the properties defining compactness 
that are the crucial ones. Chapter [13] has further explanation. 


5.3.1. Sequential definition. Intuitively, a compact set confines every sequence 
of points in the set so much that the sequence must accumulate at some point of 
the set. This implies that a subsequence converges to an accumulation point and 
leads to the following definition. 


Definition 5.32. A set K C R is sequentially compact if every sequence in K has 
a convergent subsequence whose limit belongs to K. 


Note that we require that the subsequence converges to a point in K, not toa 
point outside K. 


We usually abbreviate “sequentially compact” to “compact,” but sometimes 
we need to distinguish explicitly between the sequential definition of compactness 
given above and the topological definition given in Definition below. 


Example 5.33. The open interval J = (0,1) is not compact. The sequence (1/n) 
in I converges to 0, so every subsequence also converges to 0 ¢ I. Therefore, (1/n) 
has no convergent subsequence whose limit belongs to T. 


Example 5.34. The set A = QN (0, 1] of rational numbers in [0, 1] is not compact. 
If (rn) is a sequence of rational numbers 0 < r» < 1 that converges to 1/V/2, then 
every subsequence also converges to 1//2 ¢ A, so (rn) has no subsequence that 
converges to a point in A. 


Example 5.35. The set N is closed, but it is not compact. The sequence (n) in N 
has no convergent subsequence since every subsequence diverges to infinity. 


As these examples illustrate, a compact set must be closed and bounded. Con- 
versely, the Bolzano-Weierstrass theorem implies that that every closed, bounded 
subset of R is compact. This fact may be taken as an alternative statement of the 
theorem. 


Theorem 5.36 (Bolzano-Weierstrass). A subset of R is sequentially compact if 
and only if it is closed and bounded. 


Proof. First, assume that K C R is sequentially compact. Let (£n) be a sequence 
in K that converges to x € R. Then every subsequence of K also converges to x, 
so the compactness of K implies that x € K. It follows from Proposition [5.18] that 
K is closed. Next, suppose for contradiction that K is unbounded. Then there is 
a sequence (£n) in K such that |x,| > oo as n — oo. Every subsequence of (£n) 
is also unbounded and therefore diverges, so (£n) has no convergent subsequence. 
This contradicts the assumption that K is sequentially compact, so K is bounded. 

Conversely, assume that K C R is closed and bounded. Let (£n) be a sequence 
in K. Then (zn) is bounded since K is bounded, and Theorem [3.57] implies that 
(£n) has a convergent subsequence. Since K is closed the limit of this subsequence 
belongs to K, so K is sequentially compact. 


Example 5.37. Every closed, bounded interval [a,b] is compact. 


5.3. Compact sets 97 


Example 5.38. Let 


A={iinenh. 
n 


Then A is not compact, since it isn’t closed. However, the set K = AU {0} is closed 
and bounded, so it is compact. 


Example 5.39. The Cantor set defined in Section [5.5] is compact. 


For later use, we prove a useful property of compact sets in R which follows 
from Theorem [5.36 


Proposition 5.40. If K C R is compact, then K has a maximum and minimum. 


Proof. Since K is compact it is bounded and therefore it has a (finite) supremum 
M = sup K. From the definition of the supremum, for every n € N there exists 
£n E K such that 


1 
M--—<a, <M. 
n 
It follows from the ‘squeeze’ theorem that zn —> M as n > oo. Since K is closed, 


M € K, which proves that K has a maximum. A similar argument shows that 
m = inf K belongs to K, so K has a minimum. 


Example 5.41. The bounded closed interval [0,1] is compact and its maximum 1 
and minimum 0 belong to the set, while the open interval (0, 1) is not compact and 
its supremum 1 and infimum 0 do not belong to the set. The unbounded, closed 
interval [0,00) is not compact, and it has no maximum. 


Example 5.42. The set A in Example is not compact and its infimum 0 does 
not belong to the set, but the compact set K has 0 as a minimum value. 


Compact sets have the following nonempty intersection property. 


Theorem 5.43. Let {Kn : n € N} be a decreasing sequence of nonempty compact 
sets of real numbers, meaning that 


Ky, DKD. D Ky D Kn 2D..., 


and Kn # Ø. Then 
() Kn $ Ø. 
n=1 


Moreover, if diam Kn — 0 as n > ov, then the intersection consists of a single 
point. 


Proof. For each n € N, choose x, € Ky. Since (£n) is a sequence in the compact 
set Ky, it has a convergent subsequence (£n) with £n, —> z as k + oo. Then 
Ln, E€ Kn for all k sufficiently large that ng > n. Since a “tail” of the subsequence 
belongs to Kn and Kn is closed, we have x € Kn for every n € N. Hence, x € QN Kn, 
and the intersection is nonempty. 

If x,y € Kn, then x,y € Kn for every n € N, so |x — y| < diam Kn. If 
diam Kn > 0 as n > co, then |x — y| = 0, so x = y and (| Kn consists of a single 
point. 


98 5. Topology of the Real Numbers 


We refer to a decreasing sequence of sets as a nested sequence. In the case 
when each Kn = |an, bn] is a compact interval, the preceding result is called the 
nested interval theorem. 


Example 5.44. The nested compact intervals [0,1 + 1/n] have nonempty inter- 
section [0,1]. Here, diam[0,1+ 1/n] + 1 as n > ov, and the intersection consists 
of an interval. The nested compact intervals [0,1/n] have nonempty intersection 
{0}, which consists of a single point since diam[0,1/n] > 0 as n —> oo. On the 
other hand, the nested half-open intervals (0,1/n] have empty intersection, as do 
the nested unbounded, closed intervals [n, o0). In particular, Theorem [5.43] doesn’t 
hold if we replace “compact” by “closed.” 


Example 5.45. Define a nested sequence A, D A2 >... of non-compact sets by 
1 
A, = l ik=nntLn+2.}, 


so A; = A where A is the set considered in Example [5.38] Then 


ie 
n=1 


If we add 0 to the A,, to make them compact and define Kn, = An U {0}, then the 
intersection 


() Kn = {0} 
n=1 
is nonempty. 


5.3.2. Topological definition. To give a topological definition of compactness 
in terms of open sets, we introduce the notion of an open cover of a set. 


Definition 5.46. Let A C R. A cover of A is a collection of sets {A; C R: i € I} 
whose union contains A, 
J A: > A. 


ie] 
An open cover of A is a cover such that A; is open for every i € I. 


Example 5.47. Let A; = (1/i,2). Then C = {A; : i € N} is an open cover of 
(0, 1], since 


U (5.2) = (0,2) > (0, 1]. 


On the other hand, C is not a cover of [0,1] since its union does not contain 0. If, 
for any ô > 0, we add the interval B = (—ô, ô) to C, then 


U (G.2) UB =(-6,2) > 0,1) 


i=l 


so C’ = C U {B} is an open cover of [0, 1]. 


5.3. Compact sets 99 


Example 5.48. If A; = (i — 1,i + 1), then {A; : i € Z} is an open cover of R. 
On the other hand, if B; = (i,i+ 1), then {B; : i € Z} is not open cover of R, 
since its union doesn’t contain any of the integers. Finally, if C; = [i,i+ 1), then 
{C; : i € Z} is a cover of R by disjoint, half-open intervals, but it isn’t an open 
cover. Thus, to get an open cover, we need the intervals to “overlap”. 


Example 5.49. Let {r; : i € N} be an enumeration of the rational numbers 

€ [0,1], and fix e > 0. Define A; = (ri — e,r; + €). Then {A; : i € N} is an 
open cover of [0,1] since every irrational number x € [0,1] can be approximated 
to within e by some rational number. Similarly, if J = [0,1] \ Q denotes the set of 
irrational numbers in [0,1], then {(x — e, x + €) : x € I} is an open cover of [0, 1]. 
In this case, the cover consists of uncountably many sets. 


Next, we define subcovers. 
Definition 5.50. Suppose that C = {A; C R : i € I} is a cover of ACR. A 
subcover S of C is a sub-collection S C C that covers A, meaning that 
S={A, €C:ke J}, U 4s, > A. 
keJ 


A finite subcover is a subcover {A;,, Aj,,..., A;, } that consists of finitely many 
sets. 


Example 5.51. Consider the cover C = {A; : i € N} of (0,1] in Example 
where A; = (1/i,2). Then {Aog; : 7 € N} is a subcover. There is, however, no finite 
subcover {A;,, Aj,,..., Ai, } since if N = max{i1, i2,..., in} then 


i 1 
U Ak, = & 2) , 
k=1 
which does not contain the points x € (0,1) with 0 < x < 1/N. On the other hand, 


the cover C’ = C U {(—ô, ô)} of [0,1] does have a finite subcover. For example, if 
N €N is such that 1/N < 6, then 


(Gco) 


is a finite subcover of [0, 1] consisting of two sets (whose union is the same as the 
original cover). 


Having introduced this terminology, we give a topological definition of compact 
sets. 


Definition 5.52. A set K C R is compact if every open cover of K has a finite 
subcover. 


First, we illustrate Definition with several examples. 
Example 5.53. The collection of open intervals 


is an open cover of the natural numbers N, since 


i=1 


100 5. Topology of the Real Numbers 


However, no finite sub-collection 
{Ai Aize Aa, } 


covers N, since if N = max{i1,i2,...,in}, then 


n 
LJ Ai, c (0,N +1) 
k=1 
so its union does not contain sufficiently large integers with n > N +1. Thus, N is 
not compact. 


Example 5.54. Consider the open intervals 


1 1 1 1 
Ai = (5 gitl’ Ji ai zat) ’ 


which get smaller as they get closer to 0. Then {A;:i=0,1,2,...} is an open 
cover of the open interval (0,1); in fact 


y 3 
J Ai = (o >) > (0,1). 
i=0 
However, no finite sub-collection 
AnA saa Ain} 


of intervals covers (0,1), since if N = max{i1, i2,..., in}, then 


‘i 1 1 3 
Aa c (am a) 
k=1 
so it does not contain the points in (0, 1) that are sufficiently close to 0. Thus, (0, 1) 


is not compact. Example gives another example of an open cover of (0,1) with 
no finite subcover. 


Example 5.55. The collection of open intervals { Ao, A1, Ao,...} in Example|5.54] 
isn’t an open cover of the closed interval [0, 1] since 0 doesn’t belong to their union. 
We can get an open cover { Ao, Ai, Ag,..., B} of [0,1] by adding to the A; an open 
interval B = (—d,6), where 6 > 0 is arbitrarily small. In that case, if we choose 
n € N sufficiently large that 


1 1 
Qn g 2n+1 
then {A0, A1, Ao,..., An, B} is a finite subcover of [0, 1] since 


J 4iuB= & 5) D [0,1]. 
i=0 


Points sufficiently close to 0 belong to B, while points further away belong to A; 
for some 0 < i < n. The open cover of [0, 1] in Example is similar. 


< ô, 


As the previous example suggests, and as follows from the next theorem, every 
open cover of [0,1] has a finite subcover, and [0, 1] is compact. 


Theorem 5.56 (Heine-Borel). A subset of R is compact if and only if it is closed 
and bounded. 


5.3. Compact sets 101 


Proof. The most important direction of the theorem is that a closed, bounded set 
is compact. 

First, we prove that a closed, bounded interval K = [a,b] is compact. Suppose 
that C = {A; : i € I} is an open cover of [a, 6], and let 


B = {x € [a,b] : [a,x] has a finite subcover S C C}. 


We claim that sup B = b. The idea of the proof is that any open cover of [a, 2] 
must cover a larger interval since the open set that contains x extends past x. 

Since C covers |a, b], there exists a set A; € C with a € Aj, so [a,a] = {a} has a 
subcover consisting of a single set, and a € B. Thus, B is non-empty and bounded 
from above by b, so c = sup B < b exists. Assume for contradiction that c < b. 
Then [a, c] has a finite subcover 

{4i Aiz, pir i Áin ts 
with c € A;, for some 1 < k < n. Since A;, is open and a < c < b, there exists 
ô > 0 such that [c,c+06) C Ai, N [a,b]. Then {A;,, A;,,..., Ai, } is a finite subcover 
of [a, x] for c < x < c+, contradicting the definition of c, so sup B = b. Moreover, 
the following argument shows that, in fact, b = max B. 

Since C covers [a,b], there is an open set Ai € C such that b € A;,. Then 
(b— b+ ô) C Ai, for some 6 > 0, and since sup B = b there exists c € B 
such that b— ô < c < b. Let {Aj,,..., Ai} be a finite subcover of [a,c]. Then 
{Ai Ai,,---, Ai, } is a finite subcover of [a,b], which proves that [a,b] is compact. 

Now suppose that K C R is a closed, bounded set, and let C = {A; : i € I} 
be an open cover of K. Since K is bounded, K C [a, 6] for some closed bounded 
interval [a,b], and, since K is closed, C’ = C U {K‘} is an open cover of [a,b]. 
From what we have just proved, [a,b] has a finite subcover that is included in C’. 
Omitting K° from this subcover, if necessary, we get a finite subcover of K that is 
included in the original cover C. 


To prove the converse, suppose that K C R is compact. Let A; = (—i, i). Then 


i=1 
so { A; : i € N} is an open cover of K, which has a finite subcover {A;,, Aiz, <., Ai, }- 
Let N = max{i1, i2,..., in}. Then 


K c |] An = (-N,N), 
k=1 
so K is bounded. 


To prove that K is closed, we prove that K°® is open. Suppose that x € K®. 
For i € N, let 


A, = |x- =,£r+=| = o oren a O A E a SOO 
7 v a (7 


Then {A; : i € N} is an open cover of K, since 


J4: = (—00, x) U (z,œ) D K. 


102 5. Topology of the Real Numbers 


Since K is compact, there is a finite subcover {A;,,Aji,,...,Ai,}. Let N = 
max{i1,72,...,in}. Then 


i 1 
KC U Ai = (-c0,2- 5) U (2+ 5.0). 
Peat N N 


which implies that (x — 1/N,x +1/N) c K°. This proves that K° is open and K 
is closed. 


The following corollary is an immediate consequence of what we have proved. 


Corollary 5.57. A subset of R is compact if and only if it is sequentially compact. 


Proof. By Theorem and Theorem |5.56| a subset of R is compact or sequen- 
tially compact if and only if it is closed and bounded. 


Corollary [5.57| generalizes to an arbitrary metric space, where a set is compact 
if and only if it is sequentially compact, although a different proof is required. By 
contrast, Theorem|5.36]and T heorem|[5.56]do not hold in an arbitrary metric space, 
where a closed, bounded set need not be compact. 


5.4. Connected sets 


A connected set is, roughly speaking, a set that cannot be divided into “separated” 
parts. The formal definition is as follows. 


Definition 5.58. A set of real numbers A C R is disconnected if there are disjoint 

open sets U,V C R such that ANU and AN V are nonempty and 
A=(ANU)U(ANY). 

A set is connected if it not disconnected. 


The condition A = (ANU) U (AN V) is equivalent to UUV D A. If A is 
disconnected as in the definition, then we say that the open sets U, V separate A. 


It is easy to give examples of disconnected sets. As the following examples 
illustrate, any set of real numbers that is “missing” a point is disconnected. 


Example 5.59. The set {0,1} consisting of two points is disconnected. For exam- 
ple, let U = (—1/2,1/2) and V = (1/2,3/2). Then U, V are open and UNV = Ø. 
Furthermore, ANU = {0} and ANV = {1} are nonempty, and A = (ANU)U(ANV). 
Similarly, the union of half-open intervals [0, 1/2) U (1/2, 1] is disconnected. 


Example 5.60. The set R \ {0} is disconnected since R \ {0} = (—oo, 0) U (0, ov). 
Example 5.61. The set Q of rational numbers is disconnected. For example, let 
U = (—o0, V2) and V = (V2,00). Then U, V are disjoint open sets, QN U and 
QNV are nonempty, and U U V = R \ {V2} D Q. 


In general, it is harder to prove that a set is connected than disconnected, 
because one has to show that there is no way to separate it by open sets. However, 
the ordering properties of R enable us to characterize its connected sets: they are 
exactly the intervals. 


First, we give a precise definition of an interval. 


5.4. Connected sets 103 


Definition 5.62. A set of real numbers J C R is an interval if x,y € I anda < y 
implies that z € I for every xz < z < y. 


That is, an interval is a set with the property that it contains all the points 
between any two points in the set. 


We claim that, according to this definition, an interval contains all the points 
that lie between its infimum and supremum. The infimum and supremum may be 
finite or infinite, and they may or may not belong to the interval. Depending on 
which of these these possibilities occur, we see that an interval is any open, closed, 
half-open, bounded, or unbounded interval of the form 

@, (a,b), [a,b], [a,b), (a,b], (a,c), [a,co), (—œ,b), (—co,d], R, 
where a,b € Randa < b. If a = b, then [a,a] = {a} is an interval that consists 


of a single point, which — like the empty set — satisfies the definition vacuously. 
Thus, Definition is consistent with the usual definition of an interval. 


To prove the previous claim, suppose that J is an interval and let a = inf J, 
b = sup Z where —co < a,b < cw. Ifa > b, then J = Ø, and if a = b, then J consists 
of a single point {a}. Otherwise, —oo < a < b < oo. In that case, the definition of 
the infimum and supremum implies that for every a’,b’ € R witha <a’ < b' < b, 
there exist x,y € I such that a < x < a’ and b’ < y < b. Since J is an interval, it 
follows that I D [x,y] D [a’, b'], and since a’ > a, b’ < b are arbitrary, it follows that 
I D (a,b). Moreover, since a = inf J and b = sup J, the interval J cannot contain 
any points x € R such that x < a or x > b. 


The slightly tricky part of the following theorem is the proof that every interval 
is connected. 


Theorem 5.63. A set of real numbers is connected if and only if it is an interval. 


Proof. First, suppose that A C R is not an interval. Then there are a,b € A and 
c € A such that a < c < b. If U = (—~w,c) and V = (c,o), then a € ANU, 
be ANV, and A=(ANU)U(ANV), so A is disconnected. It follows that every 
connected set is an interval. 


To prove the converse, suppose that J C R is not connected. We will show that 
I is not an interval. Let U, V be open sets that separate J. Choose a € I NU and 
b € INV, where we can assume without loss of generality that a < b. Let 


c = sup (U N fa, b]). 


We will prove that a < c < b and c ¢ I, meaning that J is not an interval. If 
a <x < band z EU, then U D [x,x +ô) for some ô > 0, so x Æ sup (U N [a, b]). 
Thus, c # a and ifa < c < b, then c ¢ U. Ifa < y < bandy € V, then 
V D (y — ô, y] for some 6 > 0, and therefore (y — ô, y] is disjoint from U, which 
implies that y 4 sup (U N [a, b]). It follows that c# b and c ¢ U A V, soa < c< b 
and c ¢ I, which completes the proof. 


104 5. Topology of the Real Numbers 


Figure 1. An illustration of the removal of middle-thirds from an interval in 
the construction of the Cantor set. The figure shows the interval [0, 1] and the 
first four sets Fy, F2, F3, F4, going from top to bottom. 


5.5. * The Cantor set 


One of the most interesting examples of a compact set is the Cantor set, which is 
obtained by “removing middle-thirds” from closed intervals in [0,1], as illustrated 
in Figure [I] 

We define a nested sequence (F;,) of sets Fa C [0,1] as follows. First, we remove 
the middle-third from [0,1] to get Fı = [0,1] \ (1/3, 2/3), or 


1 2 
Fy=hUh, D=o], =|5,1]. 
1 0 1> 0 5): 1 a 


Next, we remove middle-thirds from Jo and I,, which splits Ip \ (1/9,2/9) into 
Ioo U Ig, and A \ (7/9,8/9) into [19 U J41, to get 


F> = loo U Joi U ho U h, 

1 2 1 2 7 8 
Io = = Io = |=, = lo = |=,- hı = |=,1] - 
00 i ol Far 10 ar 11 $1] 


Then we remove middle-thirds from Joo, Jo1, J10, and I4: to get 


F3 = Ipo0 U 1o01 U Lo10 U Loi U ioo U Jio U Jiao U Jit, 
1 2 1 2 7 8 1 
000 o. =| » oto E | ; 010 E z » Toun E d ; 


2 19 20 7 8 25 26 
lioo = |=, = hoi = |=, = Kio = |4, = Kiin=|)/—,1). 
100 E 3 ’ 101 EB 4 š 110 E =| ’ 111 EB | 


5.5. * The Cantor set 105 


Continuing in this way, we get at the nth stage a set of the form 


Fn = U Is, 


seb, 
where ©, = {(81, $2,---;8n) : Sn = 0,1} is the set of binary n-tuples. Furthermore, 
each I, = |as, bs] is a closed interval, and if s = (s1, 52,...,8n), then 
n 
28k 1 
Os = ae ba = as + 3 


In other words, the left endpoints as are the points in [0,1] that have a finite base 
three expansion consisting entirely of 0’s and 2’s. 

We can verify this formula for the endpoints ag, bs of the intervals in Fn by 
induction. It holds when n = 1. Assume that it holds for some n € N. If we 
remove the middle-third of length 1/3"*1 from the interval [as, bs] of length 1/3” 
with s € Xn, then we get the original left endpoint, which may be written as 

n+1 


2s}, 
as = > Bk” 
k=1 
where s’ = (s1,..., 5,41) E Ens is given by sẹ = sp fork =1,...,nand s,,, =0. 
We also get a new left endpoint as = as + 2/3"*1, which may be written as 
n+1 
28h, 
as = > BR 
k=1 
where sj, = sp for k =1,...,n and s},,, = 1. Moreover, bg = ag + 1/3"*1, which 


proves that the formula for the endpoints holds for n + 1. 
Definition 5.64. The Cantor set C is the intersection 


Sark 
n=l. 


of the nested sequence of sets (F;,) defined above. 


The compactness of C follows immediately from its definition. 


Theorem 5.65. The Cantor set is compact. 


Proof. The Cantor set C is bounded, since it is a subset of [0,1]. All the sets 
F, are closed, because they are a finite union of closed intervals, so, from Proposi- 
tion [5.20] their intersection is closed. It follows that C' is closed and bounded, and 
Theorem [5.36] implies that it is compact. 


The Cantor set C is clearly nonempty since the endpoints as, bs of Is are 
contained in Fn for every finite binary sequence s and every n € N. These endpoints 
form a countably infinite set. What may be initially surprising is that there are 
uncountably many other points in C that are not endpoints. For example, 1/4 
has the infinite base three expansion 1/4 = 0.020202..., so it is not one of the 
endpoints, but, as we will show, it belongs to C because it has a base three expansion 
consisting entirely of 0’s and 2’s. 


106 5. Topology of the Real Numbers 


Let X be the set of binary sequences in Definition [1-29] The idea of the 
next theorem is that each binary sequence picks out a unique point of the Can- 
tor set by telling us whether to choose the left or the right interval at each stage 
of the “middle-thirds” construction. For example, 1/4 corresponds to the sequence 
(0,1,0,1,0,1,...), and we get it by alternately choosing left and right intervals. 


Theorem 5.66. The Cantor set has the same cardinality as ©. 


Proof. We use the same notation as above. Let s = (81, 82,...,8k,---) € È, and 
define Sn = (81, 82,---,8n) E€ Un. Then (Is„) is a nested sequence of intervals 
such that diam J,, = 1/3" > 0 as n > oo. Since each Is„ is a compact interval, 
Theorem [5.43] implies that there is a unique point 


co 
rE N Is, CC. 
n=1 
Thus, s + z defines a function f : 4% — C. Furthermore, this function is one-to-one: 
if two sequences differ in the nth place, say, then the corresponding points in C 
belong to different intervals [,, at the nth stage of the construction, and therefore 
the points are different since the intervals are disjoint. 


Conversely, if x € C, then x € Fn for every n € N and there is a unique 
Sn E€ Xn such that x € Is,. The intervals (Is„) are nested, so there is a unique 
sequence s = (s1, S2,...,5%,.-.) € ©, such that Sn = (51, 2,...,5n). It follows 
that f : & — C is onto, which proves the result. 


The argument also shows that x € C if and only if it is a limit of left endpoints 


ds, meaning that 
g= 5 =, Sk = 0,1. 
k=1 
In other words, x € C if and only if it has a base 3 expansion consisting entirely of 
0’s and 2’s. Note that this condition does not exclude 1, which corresponds to the 
sequence (1,1,1,1,...) or “always pick the right interval,” and 


co 


2 
1 = 0.2222... = 9 x. 
k=1 


We may use Theorem [5.66] together with the Schréder-Bernstein theorem, to 
prove that ©, P(N) and R have the same uncountable cardinality of the continuum. 
It follows, in particular, that the Cantor set has the same cardinality as R, even 
though it appears, at first sight, to be a very sparse subset. 


Theorem 5.67. The set R of real numbers has the same cardinality as P(N). 


Proof. The inclusion map f : C —> R, where f(x) = x, is one-to-one, so C Ș R. 
From Theorem and Corollary {1.48} we have C ~ £ = P(N), so P(N) SR. 


Conversely, the map from real numbers to their Dedekind cuts, given by 
g:R>P(Q), gi:trh{rEeQ:ir<a}, 


is one-to-one, so R < P(Q). Since Q is countably infinite, P(N) ~ P(Q), so 
R < P(N). The conclusion then follows from Theorem 


5.5. * The Cantor set 107 


Another proof of this theorem, which doesn’t require the Schréder-Bernstein 
theorem, can be given by associating binary sequences in X with binary expansions 
of real numbers in [0, 1]: 


XL s 
k 
h : (81, 52,3... Sk.) 1 > Jk 
Some real numbers, however, have two distinct binary expansion; e.g., 


; = 0.10000... = 0.01111.... 


There are only countably many such numbers, so they do not affect the cardinality 
of [0,1], but they complicate the explicit construction of a one-to-one, onto map 
f: X — R by this approach. An alternative method is to represent real numbers 
by continued fractions instead of binary expansions, but we won’t describe these 
proofs in more detail here. 


ey 
Chapter 6 


Limits of Functions 


In this chapter, we define limits of functions and describe their properties. 


6.1. Limits 
We begin with the e-6 definition of the limit of a function. 


Definition 6.1. Let f : A — R, where A C R, and suppose that c € R is an 
accumulation point of A. Then 
lim f(z) =L 


wc 


if for every € > 0 there exists a ô > 0 such that 


0 < |z — c| <6 and x € A implies that |f(x)— L| < €. 


We also denote limits by the ‘arrow’ notation f(x) > L as x > c, and often 
leave it to be implicitly understood that x € A is restricted to the domain of f. 
Note that it follows directly from the definition that 


lim f(z) =L ifand only if lim |f(x)— L| = 0. 
mc wc 


In defining a limit as x — c, we do not consider what happens when x = c, 
and a function needn’t be defined at c for its limit to exist. This is the case, for 
example, when we define the derivative of a function as a limit of its difference 
quotients. Moreover, even if a function is defined at c and its limit as x —> c exists, 
the value of the function need not equal the limit. In fact, the condition that 
lime f(x) = f(c) defines the continuity of f at c. We study continuous functions 
in Chapter [7] 


Example 6.2. Let A = [0,00) \ {9} and define f : A > R by 
x—9 


109 


110 6. Limits of Functions 


We claim that 
lim f(x) = 6. 
29 


To prove this, let € > 0 be given. If x € A, then yx — 3 Æ 0, and dividing this 
factor into the numerator we get f(x) = yx +3. It follows that 


=9 1 
se) 61 =|ve-3| =|] < 5 


Thus, if 6 = 3e, then x € A and |x — 9| < ô implies that | f(x) — 6| < €. 


|x — 9]. 


Like the limits of sequences, limits of functions are unique. 
Proposition 6.3. The limit of a function is unique if it exists. 


Proof. Suppose that f : A > R and c € R is an accumulation point of A C R. 
Assume that 


lim f(z) =n, lim f(x) = Le 
where L1, Ly € R. For every e > 0 there exist 6,62 > 0 such that 
0 < |x —c| < ô and x € A implies that | f(a) — Li| < €/2, 
0 < |æ — c| < d2 and x € A implies that | f(x) — Le| < €/2. 


Let ô = min(0,, 62) > 0. Then, since c is an accumulation point of A, there exists 
x € A such that 0 < |x — c| < ô. It follows that 


[Li — Lol < |La — f(@)| + |f (x) — Lal < €. 


Since this holds for arbitrary e€ > 0, we must have Lı = Lə. 


Note that in this proof we used the requirement in the definition of a limit 
that c is an accumulation point of A. The limit definition would be vacuous if it 
was applied to a non-accumulation point, and in that case every L € R would be a 
limit. 

We can rephrase the ¢-6 definition of limits in terms of neighborhoods. Recall 
from Definition 5.6] that a set V C R is a neighborhood of c € R if V D (c—ô,c+ ô) 
for some ô > 0, and (c — ô, c + ô) is called a -neighborhood of c. 


Definition 6.4. A set U C R is a punctured (or deleted) neighborhood of c € R 
if U D (ce — ô, c) U (c, c + ô) for some ô > 0. The set (c — ô, c) U (c, c + ô) is called a 
punctured (or deleted) -neighborhood of c. 


That is, a punctured neighborhood of c is a neighborhood of c with the point 
c itself removed. 


Definition 6.5. Let f : A — R, where A C R, and suppose that c € R is an 
accumulation point of A. Then 


lim f(x) = L 


if and only if for every neighborhood V of L, there is a punctured neighborhood U 
of c such that 
xz € ANU implies that f(x) € V. 


6.1. Limits 111 


This is essentially a rewording of the e-ô definition. If Definition [6.1] holds and 
V is a neighborhood of L, then V contains an e-neighborhood of L, so there is a 
punctured 6-neighborhood U of c such that f maps U N A into V, which verifies 
Definition [6.5} Conversely, if Definition [6.5] holds and e > 0, then V = (L—«, L+ €) 
is a neighborhood of L, so there is a punctured neighborhood U of c such that f 
maps UNA into V, and U contains a punctured 6-neighborhood of c, which verifies 
Definition 


The next theorem gives an equivalent sequential characterization of the limit. 


Theorem 6.6. Let f : A — R, where A C R, and suppose that c € R is an 
accumulation point of A. Then 


lim f(¢) = 
if and only if 
lim f(z) = L. 
n— co 


for every sequence (zn) in A with x, # c for all n € N such that 


lim £p = c. 
noo 


Proof. First assume that the limit exists and is equal to L. Suppose that (£n) is 
any sequence in A with x, # c that converges to c, and let e > 0 be given. From 
Definition 6.1] there exists 6 > 0 such that | f(a) — L| < e whenever 0 < |x — c| < ô, 
and since £n — c there exists N € N such that 0 < |a, — c| < ô for all n > N. It 
follows that |f (£n) — L| < e whenever n > N, so f(£n) > L as n > o. 

To prove the converse, assume that the limit does not exist or is not equal to 
L. Then there is an €9 > 0 such that for every 6 > 0 there is a point x € A with 
0 < |æ — c| < 6 but |f(x)— L| > co. Therefore, for every n € N there is an £n € A 
such that 


1 
0< |En = c| < = flan) = L| > €. 


It follows that £n Æ c and x, > c, but f(£n) Æ L, so the sequential condition does 
not hold. This proves the result. 


A non-existence proof for a limit directly from Definition [6.T]is often awkward. 
(One has to show that for every L € R there exists €o > 0 such that for every ô > 0 
there exists x € A with 0 < |x — c| < 6 and |f(x)— L| > €o.) The previous theorem 
gives a convenient way to show that a limit of a function does not exist. 


Corollary 6.7. Suppose that f : A > R and c € R is an accumulation point of A. 
Then lim, _,. f(x) does not exist if either of the following conditions holds: 


(1) There are sequences (£n), (Yn) in A with £n, Yn Ac such that 


lim £n = lim y,=c, but lim f(a,) 4 lim f(yn). 
n—-oco n— oo n— o0 n—> oo 


(2) There is a sequence (xn) in A with x, Æ c such that lim,_,.. £n = c but the 
sequence (f(x£n)) diverges. 


112 6. Limits of Functions 


: | X ; Ni Mi 
| | Mv 


-1.5, -1.5 
= 3 -0.1 -0.05 0 0.05 0.1 


Figure 1. A plot of the function y = sin(1/x), with the hyperbola y = 1/z 
shown in red, and a detail near the origin. 


Example 6.8. Define the sign function sgn : R > R by 


1 if x >Q, 
sgnz = 4 0 if x = 0, 
—1 ifx<0, 


Then the limit 

lim sgn x 

x2—0 
doesn’t exist. To prove this, note that (1/n) is a non-zero sequence such that 
1/n > 0 and sgn(1/n) > 1 as n > œ, while (—1/n) is a non-zero sequence such 
that —1/n > 0 and sgn(—1/n) + —1 as n > oo. Since the sequences of sgn-values 
have different limits, Corollary [6.7] implies that the limit does not exist. 


Example 6.9. The limit 


corresponding to the function f : R \ {0} > R given by f(x) = 1/z, doesn’t exist. 
For example, if (£n) is the non-zero sequence given by £n = 1/n, then 1/n > 0 but 
the sequence of values (n) diverges to oo. 


ae 

lim sin | — J, 

x—0 P 
corresponding to the function f : R \ {0} > R given by f(x) = sin(1/x), doesn’t 
exist. (See Figure Ep For example, the non-zero sequences (£n), (Yn) defined by 
2 dl _ 1 
Onn’ a 2rn + 1/2 


both converge to zero as n — oo, but the limits 


Example 6.10. The limit 


In 


Jim f(@n)=0, dim f(¥n) = 1 


are different. 


6.1. Limits 113 


Like sequences, functions must satisfy a boundedness condition if their limit is 
to exist. Before stating this condition, we define the supremum and infimum of a 
function, which are the supremum or infimum of its range. 


Definition 6.11. If f : A— R is a real-valued function, then 
sup f = sup {f (x): x € A}, inf f = inf {f (x) : £ € A}. 
A 


A function is bounded if its range is bounded. 


Definition 6.12. If f : A — R, then f is bounded from above if sup, f is finite, 
bounded from below if inf 4 f is finite, and bounded if both are finite. A function 
that is not bounded is said to be unbounded. 


Example 6.13. If f : [0,2] + R is defined by f(x) = x”, then 


sup f = 4, inf f = 0, 
[0,2] [0,2] 


so f is bounded. 
Example 6.14. If f : (0, 1] > R is defined by f(x) = 1/x, then 


sup f = œ, inf f =1, 
(0,1] (0,1] 


so f is bounded from below, not bounded from above, and unbounded. Note that 
if we extend f to a function g : [0,1] — R by defining, for example, 


1/z if0<x<1, 
g(a) = f if £ = 0, 
then g is still unbounded on [0, 1]. 
Equivalently, a function f : A > R is bounded if sup, |f] is finite, meaning 
that there exists M > 0 such that 
|f(x)| < M for every z € A. 


If B C A, then we say that f is bounded from above on B if sup, f is finite, with 
similar terminology for bounded from below on B, and bounded on B. 


Example 6.15. The function f : (0,1] > R defined by f(x) = 1/x is unbounded, 
but it is bounded on every interval [6,1] with 0 < ô < 1. The function g : R > R 
defined by g(x) = x? is unbounded, but it is bounded on every finite interval [a, b]. 


We also introduce a notion of being bounded near a point. 


Definition 6.16. Suppose that f : A — R and c is an accumulation point of A. 
Then f is locally bounded at c if there is a neighborhood U of c such that f is 
bounded on ANU. 


Example 6.17. The function f : (0,1) — R defined by f(x) = 1/z is locally 
bounded at every 0 < c < 1, but it is not locally bounded at 0. 


Proposition 6.18. Suppose that f : A — R and c is an accumulation point of A. 
If lim,-.. f(x) exists, then f is locally bounded at c. 


114 6. Limits of Functions 


Proof. Let limz_,. f(z) = L. Taking e = 1 in the definition of the limit, we get 
that there exists a 6 > 0 such that 


0 < |z — c| < and x € A implies that | f(a) — L| < 1. 
Let U = (c — ô,c + ô). If x € ANU and z Æ c, then 
It(2)| < [f(@) — E+ 12] ESES 
so f is bounded on ANU. (If c € A, then |f| < max{1 + |L], |f(c)|} on ANU.) 


As for sequences, boundedness is a necessary but not sufficient condition for 
the existence of a limit. 


Example 6.19. The limit 
lim =, 
x70 g£ 


considered in Example|6.9]doesn’t exist because the function f : R\ {0} > R given 
by f(x) = 1/2 is not locally bounded at 0. 


Example 6.20. The function f : R \ {0} > R defined by 


T= (+) 


is bounded, but limz_,9 f(x) doesn’t exist. 


6.2. Left, right, and infinite limits 


We can define other kinds of limits in an obvious way. We list some of them here 
and give examples, whose proofs are left as an exercise. All these definitions can be 
combined in various ways and have obvious equivalent sequential characterizations. 


Definition 6.21 (Right and left limits). Let f: A —> R, where ACR. If cE Ris 
an accumulation point of {x € A: a > c}, then f has the right limit 


lim f(x) =L, 
act 
if for every € > 0 there exists a ô > 0 such that 
c<a<c+d6 and z E€ A implies that | f(x) — L| < e. 
If c € R is an accumulation point of {x € A: x < c}, then f has the left limit 
lim f(r) = L, 
zc 
if for every € > 0 there exists a ô > 0 such that 
c—6<a<cand z E€ A implies that | f(x) — L| < e. 
Equivalently, the right limit of f is the limit of the restriction f|,, of f to the 
set At = {rE A: x >c}, 
Jim, f(x) = lim flas (2), 


and analogously for the left limit. 


6.2. Left, right, and infinite limits 115 


Example 6.22. For the sign function in Example [6.8] we have 


lim sgnz=1, lim sgnz = -1, 
x— 0t xz—>07 


although the corresponding limit does not exist. 


The existence and equality of the left and right limits implies the existence of 
the limit. 


Proposition 6.23. Suppose that f : A —> R, where A C R, and c € R is an 
accumulation point of both {x € A : z > c} and {x € A: x < c}. Then 


lim f(¢) = L 
if and only if 
lim f(x) = lim f(x)= L. 


act @c7 


Proof. It follows immediately from the definitions that the existence of the limit 
implies the existence of the left and right limits with the same value. Conversely, 
if both left and right limits exists and are equal to L, then given e > 0, there exist 
6, > 0 and 62 > 0 such that 


c— 6, <x < cand x € A implies that | f(x) — L| < €, 
c< x< c+ ô and x € A implies that | f(x) — L| < e. 


Choosing 6 = min(d1, 62) > 0, we get that 


|e —c| < ô and x € A implies that |f (x) — L| < €, 


which show that the limit exists. 


Next we introduce some convenient definitions for various kinds of limits involv- 
ing infinity. We emphasize that oo and —oo are not real numbers (what is sin oo, 
for example?) and all these definition have precise translations into statements that 
involve only real numbers. 


Definition 6.24 (Limits as x — +00). Let f : A —> R, where ACR. If A is not 
bounded from above, then 


lim f(x)= L 
T—> 00 
if for every e > 0 there exists an M € R such that 
x > M and z € A implies that |f(x)— L| < e. 
If A is not bounded from below, then 


lim f(#)=L 


p=- 
if for every e > 0 there exists an m € R such that 


x < mand z € A implies that | f(x) — L| < e. 


116 6. Limits of Functions 


Sometimes we write +00 instead of co to indicate that it denotes arbitrarily 
large, positive values, while —oo denotes arbitrarily large, negative values. 


It follows from the definitions that 


: ; 1 ; ; 1 
sim f(a) = im (3) m ste) = tim # (FZ), 
and it is often useful to convert one of these limits into the other. 


Example 6.25. We have 


Definition 6.26 (Divergence to +00). Let f : A > R, where A C R, and suppose 
that c € R is an accumulation point of A. Then 


lim f(a) = co 


wc 
if for every M € R there exists a ô > 0 such that 
0 <|x—c| <6 and x € A implies that f(r) > M, 


and 
lim f(x) = —oo 


«wc 


if for every m € R there exists a 6 > 0 such that 


0 < |x — c| < ô and x € A implies that f(x) < m. 


The notation lim,_,. f(x) = +00 is simply shorthand for the property stated 
in this definition; it does not mean that the limit exists, and we say that f diverges 
to +00. 


Example 6.27. We have 


lim — 4 +00, 


since 1/a takes arbitrarily large positive (if x > 0) and negative (if x < 0) values 
in every two-sided neighborhood of 0. 


Example 6.29. None of the limits 


a eee wes” Le tf 1 we ode of. fol 
lim —sin{— ], lim —sin{—], lim — sin | — 
x—0t T x z30- T x x0 T x 


is œo or —oo, since (1/x)sin(1/z) oscillates between arbitrarily large positive and 
negative values in every one-sided or two-sided neighborhood of 0. 


6.3. Properties of limits 117 


Example 6.30. We have 


1 1 
lim ( — a) = —00, lim ( — F) =00; 
wo \ x a>—oo \ £ 


How would you define these statements precisely and prove them? 


6.3. Properties of limits 


The properties of limits of functions follow from the corresponding properties of 
sequences and the sequential characterization of the limit in Theorem [6.6] We can 
also prove them directly from the e-ô definition of the limit. 


6.3.1. Order properties. As for limits of sequences, limits of functions preserve 
(non-strict) inequalities. 


Theorem 6.31. Suppose that f,g:A— R and c is an accumulation point of A. 
If 


f(x) < g(x) for all x € A, 


and lims c f(x), iMs—e g(x) exist, then 
. 2% 
lim f(x) < lim g(x) 
Proof. Let 
lim f(z) =L, lim g(x) = M. 
Suppose for contradiction that L > M, and let 
1 
From the definition of the limit, there exist 6,,62 > 0 such that 
|f(~) —L| <e if z € A and 0 < |z — c| < hy, 
\g(z) — M| <€ if x € A and 0 < |z — c| < do. 


Let 6 = min(ô1, 52). Since c is an accumulation point of A, there exists z € A such 
that 0 < |x — a| < ð, and it follows that 


f(z) — g(a) = [f (x) -— L] + L- M + [M — g(@)] 
>L-M—2 
> 0, 


which contradicts the assumption that f(x) < g(x). 


Finally, we state a useful “sandwich” or “squeeze” criterion for the existence of 
a limit. 


Theorem 6.32. Suppose that f,g,h : A — R and c is an accumulation point of 
A. If 
f(x) < glx) < h(x) for all z € A 


and 


118 6. Limits of Functions 


then the limit of g(x) as x > c exists and 
lim g(x) = L. 


T>e 
We leave the proof as an exercise. We often use this result, without comment, 
in the following way: If 
0< f(x) <g(z) or |f(x)| < g(2) 
and g(x) > 0 as z > c, then f(x) > 0 as z > c. 
It is essential for the bounding functions f, h in Theorem to have the same 
limit. 
Example 6.33. We have 
1 
-1<sin(=) <1 for alla £0 
x 


and 
lim (—1) = —1, lim 1=1, 


«0 «+0 


but 
1 
lim sin (=) does not exist. 
x0 DG. 


6.3.2. Algebraic properties. Limits of functions respect algebraic operations. 


Theorem 6.34. Suppose that f,g:A— R, c is an accumulation point of A, and 
the limits 
lim f(x) = L, lim g(x) = M 


rc 2c 
exist. Then 
lim kf(x)= kL for every k € R, 
lim [f(2) + g(e)] = L +M, 
lim [f(x)g(2)] = LM, 
lim = = ~ if M #0. 


Proof. We prove the results for sums and products from the definition of the limit, 
and leave the remaining proofs as an exercise. All of the results also follow from 
the corresponding results for sequences. 


First, we consider the limit of f + g. Given e > 0, choose 61, 62 such that 
0 < |z — c| < 6, and x € A implies that | f(x) — L| < €/2, 
0 < |z — c| < 6g and z € A implies that |g(x)— M| < €/2, 
and let 6 = min(ô1, 62) > 0. Then 0 < |x — c| < 6 implies that 
|f(x) + g(x) — (L + M)| < | f(z) — L| + |g(z) — M| < e, 
which proves that lim(f + g) = lim f + limg. 


To prove the result for the limit of the product, first note that from the local 
boundedness of functions with a limit (Proposition |6.18) there exists ôo > 0 and 


6.3. Properties of limits 119 


K > 0 such that |g(x)| < K for all x € A with 0 < |x —c| < ôo. Choose 41,52 > 0 
such that 


0 < |a —c| < 6; and x € A implies that | f(a) — L| < €/(2Kh), 
0 < |£ —c| < d2 and x € A implies that |g(x) — M| < €/(2|L| + 1). 
Let 6 = min(do, 61, 62) > 0. Then for 0 < |x — c| < ô and x € A, 
|f(@)g(@) — LM] = |(f(@) — L) g(a) + L (g(x) — M)| 
< |f@) — L| lg) + |Z] |g) -M| 


€ € 
| ihe 
ok K+ ona 


<€, 


which proves that lim(fg) = lim f lim g. 


= 
Chapter 7 


Continuous Functions 


In this chapter, we define continuous functions and study their properties. 


7.1. Continuity 


Continuous functions are functions that take nearby values at nearby points. 


Definition 7.1. Let f : A —> R, where A C R, and suppose that c € A. Then f is 
continuous at c if for every € > 0 there exists a 6 > 0 such that 
|x — c| < ô and x € A implies that | f(x) — f(c)| < e. 


A function f : A > R is continuous if it is continuous at every point of A, and it is 
continuous on B C A if it is continuous at every point in B. 


The definition of continuity at a point may be stated in terms of neighborhoods 
as follows. 


Definition 7.2. A function f : A — R, where A C R, is continuous at c € A if for 
every neighborhood V of f(c) there is a neighborhood U of c such that 


x € ANU implies that f(x) € V. 


The e-ô definition corresponds to the case when V is an e-neighborhood of f(c) 
and U is a 6-neighborhood of c. 


Note that c must belong to the domain A of f in order to define the continuity 
of f at c. If cis an isolated point of A, then the continuity condition holds auto- 
matically since, for sufficiently small 6 > 0, the only point x € A with |x — c| < 6 
is x = c, and then 0 = |f(x) — f(c)| < e. Thus, a function is continuous at every 
isolated point of its domain, and isolated points are not of much interest. 


If c € Ais an accumulation point of A, then the continuity of f at cis equivalent 
to the condition that 


lim f(x) = flo), 
meaning that the limit of f as x > c exists and is equal to the value of f at c. 


121 


122 7. Continuous Functions 


Example 7.3. If f : (a,b) — R is defined on an open interval, then f is continuous 
on (a,b) if and only if 


lim f(x) = f(c) for everya<c<b 


T>e 


since every point of (a,b) is an accumulation point. 


Example 7.4. If f : [a,b] > R is defined on a closed, bounded interval, then f is 
continuous on [a,b] if and only if 


lim f(x) = flo) for every a < c < b, 
jim f(z)= fla), lim f(z) = f). 


Example 7.5. Suppose that 


and f : A > R is defined by 
1 
f(0) = Yo, (2) = Yn 
n 


for some values yo, Yn E€ R. Then 1/n is an isolated point of A for every n € N, 
so f is continuous at 1/n for every choice of yn. The remaining point 0 € A is an 
accumulation point of A, and the condition for f to be continuous at 0 is that 


lim Yn = Yo. 
noo 


As for limits, we can give an equivalent sequential definition of continuity, which 
follows immediately from Theorem [6.6] 


Theorem 7.6. If f: A — R and c € A is an accumulation point of A, then f is 
continuous at c if and only if 


lim (an) = fle) 


n— oo 
for every sequence (xn) in A such that £n 4 c as n —> oo. 


In particular, f is discontinuous at c € A if there is sequence (zn) in the domain 
A of f such that x, > c but f(a,) Æ f(c). 


Let’s consider some examples of continuous and discontinuous functions to 
illustrate the definition. 


Example 7.7. The function f : [0,00) + R defined by f(x) = yx is continuous 
on [0,0o). To prove that f is continuous at c > 0, we note that for 0 < a < ov, 


f(z) — FOl = |Va — ve| 


c|, 


z-e |> 1 | 
= x 
VJt+ Vel” Ve 
so given € > 0, we can choose 6 = y/ce > 0 in the definition of continuity. To prove 
that f is continuous at 0, we note that if 0 < x < 6 where 6 = e? > 0, then 


f(x) — f0)| = Va < e. 


7.1. Continuity 123 


Example 7.8. The function sin : R > R is continuous on R. To prove this, we use 
the trigonometric identity for the difference of sines and the inequality | sin z| < |z|: 


. . z+e\. r-C 
[sinz = sinel = f2e0s ( ZEE) sin ( 5 ) 
. {@-c 
sin ( : )| 


<|x-cl. 


<2 


It follows that we can take 6 = e in the definition of continuity for every c € R. 


Example 7.9. The sign function sgn : R > R, defined by 


1 if x >Q, 
senz = <0 if x = 0, 
—1 ifx<0, 


is not continuous at 0 since lim,_,9 sgn x does not exist (see Example|6.8). The left 
and right limits of sgn at 0, 


lim f(z) =-1, lim f(#) =1, 


x— 0t 


do exist, but they are unequal. We say that f has a jump discontinuity at 0. 
Example 7.10. The function f : R — R defined by 
1/x ifx #0, 
f(z) = f if £= 0, 


is not continuous at 0 since limz—o f(x) does not exist (see Example 6.9). The left 
and right limits of f at 0 do not exist either, and we say that f has an essential 
discontinuity at 0. 


Example 7.11. The function f : R — R defined by 
_ jsin(1/z) if 40, 
He t if c =0 


is continuous at c Æ 0 (see Example below) but discontinuous at 0 because 
limz-40 f(x) does not exist (see Example |6.10). 


Example 7.12. The function f : R > R defined by 
xsin(1/x) ifa 40, 
f(x) = lee, 
0 ifxr=0 


is continuous at every point of R. (See Figure[1}) The continuity at c Æ 0 is proved 
in Example below. To prove continuity at 0, note that for z 4 0, 

|f(x) — f(0)| = |z sia(1/x)| < |e], 
so f(x) + f(0) as z > 0. If we had defined f(0) to be any value other than 


0, then f would not be continuous at 0. In that case, f would have a removable 
discontinuity at 0. 


124 7. Continuous Functions 


Figure 1. A plot of the function y = xsin(1/z) and a detail near the origin 
with the lines y = +a shown in red. 


Example 7.13. The Dirichlet function f : R — R defined by 


_ ji tae Q, 
o= frg 


is discontinuous at every c € R. If c ¢ Q, choose a sequence (£n) of rational 
numbers such that £n — c (possible since Q is dense in R). Then x, —> c and 
f(an) > 1 but f(c) = 0. If c € Q, choose a sequence (zn) of irrational numbers 
such that x, — c; for example if c = p/q, we can take 
2 
E A 
q n 

since £n € Q would imply that /2 € Q. Then £n > cand f(£n) > 0 but f(c) = 1. 
Alternatively, by taking a rational sequence (£„) and an irrational sequence (%,,) 
that converge to c, we can see that lim,-,. f(a) does not exist for any c € R. 


Example 7.14. The Thomae function f : R > R is defined by 


f(a) 1/q if x = p/q E€ Q where p and q > 0 are relatively prime, 
gz] = 
0 if r éQorz=0. 


Figure [2] shows the graph of f on [0,1]. The Thomae function is continuous at 0 
and every irrational number and discontinuous at every nonzero rational number. 


To prove this claim, first suppose that « = p/q € Q \ {0} is rational and 
nonzero. Then f(x) = 1/q > 0, but for every 6 > 0, the interval (x — ô, x + ô) 
contains irrational points y such that f(y) = 0 and |f(x)— f(y)| = 1/¢. The 
definition of continuity therefore fails if 0 < e < 1/q, and f is discontinuous at zx. 

Second, suppose that x ¢ Q is irrational. Given € > 0, choose n € N such that 
1/n < e. There are finitely many rational numbers r = p/q in the interval (x — 
1,x + 1) with p, q relatively prime and 1 < q < n; we list them as {r1,12,..., Tm}. 
Choose 

ô = min{|x — rg| : k =1,2,...,n} 


7.2. Properties of continuous functions 125 


Figure 2. A plot of the Thomae function in Example on (0, 1] 


to be the distance of x to the closest such rational number. Then ô > 0 since 
x ¢ Q. Furthermore, if |x — y| < 6, then either y is irrational and f(y) = 0, or 
y = p/q in lowest terms with q > n and f(y) = 1/q < 1/n < e. In either case, 
| f(x) — f(y)| = |f(y)| < €, which proves that f is continuous at x ¢ Q. 


The continuity of f at 0 follows immediately from the inequality 0 < f(x) < |a| 
for alla € R. 


We give a rough classification of discontinuities of a function f : A > R at an 
accumulation point c € A as follows. 


(1) Removable discontinuity: limz+- f(x) = L exists but L # f(c), in which case 
we can make f continuous at c by redefining f(c) = L (see Example|7.12). 


(2) Jump discontinuity: lim,z+- f(x) doesn’t exist, but both the left and right 
limits limpe- f(x), lim,_,.+ f(a) exist and are different (see Example [7.9}. 


(3) Essential discontinuity: limz+- f(x) doesn’t exist and at least one of the left 
or right limits lim,_,.- f(x), lim,_,.+ f(x) doesn’t exist (see Examples 


ANAE) 


7.2. Properties of continuous functions 


The basic properties of continuous functions follow from those of limits. 


Theorem 7.15. If f,g : A — R are continuous at c € A and k ER, then kf, f +g, 
and fg are continuous at c. Moreover, if g(c) # 0 then f/g is continuous at c. 


Proof. This result follows immediately Theorem 


A polynomial function is a function P : R > R of the form 


P(x) = ao + a£ + aor? +--+: + ane” 


126 7. Continuous Functions 


where do, @1,02,---,@p are real coefficients. A rational function R is a ratio of 
polynomials P, Q 
P(x 
R(x) = ( : 
Q(z) 


The domain of R is the set of points in R such that Q # 0. 


Corollary 7.16. Every polynomial function is continuous on R and every rational 
function is continuous on its domain. 


Proof. The constant function f(x) = 1 and the identity function g(x) = x are 
continuous on R. Repeated application of Theorem [7.15] for scalar multiples, sums, 
and products implies that every polynomial is continuous on R. It also follows that 
a rational function R = P/Q is continuous at every point where Q # 0. 


Example 7.17. The function f : R —> R given by 
z + 3r + 52° 
f= 1 +x? +t 


is continuous on R since it is a rational function whose denominator never vanishes. 


In addition to forming sums, products and quotients, another way to build up 
more complicated functions from simpler functions is by composition. We recall 
that the composition go f of functions f, g is defined by (go f)(x) = g (f(x)). The 
next theorem states that the composition of continuous functions is continuous; 
note carefully the points at which we assume f and g are continuous. 


Theorem 7.18. Let f : A— R and g : B —> R where f(A) C B. If f is continuous 
at c € A and g is continuous at f(c) € B, then go f : A > R is continuous at c. 


Proof. Let « > 0 be given. Since g is continuous at f(c), there exists 7 > 0 such 
that 
ly — f(c)| < n and y € B implies that |g(y) — g (F(c))| < €. 
Next, since f is continuous at c, there exists ô > 0 such that 
|z — c| < 6 and z € A implies that | f(a) — f(c)| < n. 
Combing these inequalities, we get that 


|x — c| < ô and x € A implies that |g (f(x))— g (f(o)| < €, 


which proves that go f is continuous at c. 


Corollary 7.19. Let f : A— Rand g : B —> R where f(A) C B. If f is continuous 
on A and g is continuous on f(A), then go f is continuous on A. 


Example 7.20. The function 


1/sing ifx#nr forn Ez, 
f(x) = . 
0 if xr=nr forneZ 


is continuous on R \ {nz : n € Z}, since it is the composition of x > sin x, which is 
continuous on R, and y +> 1/y, which is continuous on R \ {0}, and sin z Æ 0 when 
x #nn. It is discontinuous at x = nr because it is not locally bounded at those 
points. 


7.3. Uniform continuity 127 


Example 7.21. The function 
sin(l/x) if «40, 
fa a À 
0 fg=0 
is continuous on R \ {0}, since it is the composition of x + 1/æ, which is continuous 


on R \ {0}, and y+ siny, which is continuous on R. 


Example 7.22. The function 


xsin(1/xz) if 40, 
fla) = 470m À 
0 if x = 0. 
is continuous on R \ {0} since it is a product of functions that are continuous on 


R \ {0}. As shown in Example f is also continuous at 0, so f is continuous 
on R. 


7.3. Uniform continuity 


Uniform continuity is a subtle but powerful strengthening of continuity. 


Definition 7.23. Let f : A — R, where A C R. Then f is uniformly continuous 
on A if for every e > 0 there exists a ô > 0 such that 


|z — y| < ô and x,y € A implies that |f(x) — f(y)| < €. 


The key point of this definition is that 6 depends only on e€, not on xz, y. A 
uniformly continuous function on A is continuous at every point of A, but the 
converse is not true. 

To explain this point in more detail, note that if a function f is continuous on 
A, then given € > 0 and c € A, there exists (e€, c) > 0 such that 


|x — c| < d(e,c) and x € A implies that | f(x) — f(c)| < €. 
If for some €9 > 0 we have 
inf 5(€0, c) =0 
however we choose 6(€9,c) > 0 in the definition of continuity, then no do(€9) > 0 


depending only on €o works simultaneously for every c € A. In that case, the 
function is continuous on A but not uniformly continuous. 

Before giving some examples, we state a sequential condition for uniform con- 
tinuity to fail. 


Proposition 7.24. A function f : A — R is not uniformly continuous on A if and 
only if there exists co > 0 and sequences (£n), (Yn) in A such that 


lim |£n — Yn| = 0 and |f(an) — f(yn)| = €o for all n € N. 
noo 
Proof. If f is not uniformly continuous, then there exists €o > 0 such that for every 
ô > 0 there are points x,y € A with |z — y| < 6 and |f(x) — f(y)| > co. Choosing 
Ln, Yn E A to be any such points for 6 = 1/n, we get the required sequences. 
Conversely, if the sequential condition holds, then for every 6 > 0 there exists 
n € N such that |£n — yn| < ô and |f (£n) — f(yn)| = €o. It follows that the uniform 


128 7. Continuous Functions 


continuity condition in Definition cannot hold for any 6 > 0 if € = €o, so f is 
not uniformly continuous. 


Example 7.25. Example[7.8]shows that the sine function is uniformly continuous 
on R, since we can take ô = e for every x,y E R. 


Example 7.26. Define f : [0,1] > R by f(x) = x?. Then f is uniformly continuous 
on [0,1]. To prove this, note that for all x,y € [0,1] we have 


a —y?| = |z +ylļ |e—-y| < 2|e-yl, 


so we can take 6 = €/2 in the definition of uniform continuity. Similarly, f(x) = x? 


is uniformly continuous on any bounded set. 


Example 7.27. The function f(z) = x? is continuous but not uniformly continuous 
on R. We have already proved that f is continuous on R (it’s a polynomial). To 
prove that f is not uniformly continuous, let 


In =, Yn =n+—. 
n 
Then 
lim |En — yn| = lim = =0, 
noo noo n 
but 


iy? 1 
fen) = Fun) = (n+ =) -n =2+ -7322 for every n € N. 


It follows from Proposition that f is not uniformly continuous on R. The 
problem here is that in order to prove the continuity of f at c, given € > 0 we need 
to make ô(e€, c) smaller as c gets larger, and d(€,c) > 0 as c > oo. 


Example 7.28. The function f : (0,1] > R defined by 


is continuous but not uniformly continuous on (0, 1]. It is continuous on (0, 1] since 
it’s a rational function whose denominator x is nonzero in (0,1). To prove that f 
is not uniformly continuous, we define £n, Yn € (0,1) for n € N by 


Then |£n — Yn| > 0 as n > œo, but 


|f(an) — f(yn)| = (n+ 1)-n=1 for every n € N. 


It follows from Proposition that f is not uniformly continuous on (0, 1]. The 
problem here is that given € > 0, we need to make d(e,c) smaller as c gets closer to 
0, and d(e,c) — 0 as c — OF. 


The non-uniformly continuous functions in the last two examples were un- 
bounded. However, even bounded continuous functions can fail to be uniformly 
continuous if they oscillate arbitrarily quickly. 


7.4. Continuous functions and open sets 129 


Example 7.29. Define f : (0,1] > R by 


f(z) =sin (>) 


Then f is continuous on (0, 1] but it isn’t uniformly continuous on (0, 1]. To prove 
this, define £n, Yn € (0, 1] for n € N by 
1 1 


Tn T One’ Mm = Onn + 1/2" 


Then |£n — Yn| > 0 as n > ov, but 


|f(an) — f(yn)| = sin (2nz + =) —sin2n7 = 1 for all n € N. 


It isn’t a coincidence that these examples of non-uniformly continuous functions 
have domains that are either unbounded or not closed. We will prove in Section [7.5] 
that a continuous function on a compact set is uniformly continuous. 


7.4. Continuous functions and open sets 


Let f : A > R be a function. Recall that if B C A, then the image of B under f 
is the set 
f(B) ={y E€ R: y= f(x) for some z € B}, 


and if C C R, then the inverse image, or preimage, of C under f is the set 
f (C) ={xrE A: f(a) eC}. 
The next example illustrates how open sets behave under continuous functions. 


Example 7.30. Define f : R > R by f(x) = x”, and consider the open interval 
I = (1,4). Then both f(T) = (1,16) and f~!(I) = (—2, —1)U (1,2) are open. There 
are two intervals in the inverse image of I because f is two-to-one on f~'(I). On 
the other hand, if J = (—1, 1), then 


f(J) = [0, 1), ie) = (—1,1), 


so the inverse image of the open interval J is open, but the image is not. 


Thus, a continuous function needn’t map open sets to open sets. As we will 
show, however, the inverse image of an open set under a continuous function is 
always open. This property is the topological definition of a continuous function; 
it is a global definition in the sense that it is equivalent to the continuity of the 
function at every point of its domain. 

Recall from Section [5.1] that a subset B of a set A C R is relatively open in 
A, or open in A, if B = ANU where U is open in R. Moreover, as stated in 
Proposition [5.13] B is relatively open in A if and only if every point x € B has a 
relative neighborhood C = AN V such that C C B, where V is a neighborhood of 
zinR. 


Theorem 7.31. A function f : A — R is continuous on A if and only if f~!(V) is 
open in A for every set V that is open in R. 


130 7. Continuous Functions 


Proof. First assume that f is continuous on A, and suppose that c € f~!(V). 
Then f(c) € V and since V is open it contains an e-neighborhood 


Ve (F(c)) = (Fle) — e, fle) + €) 
of f(c). Since f is continuous at c, there is a -neighborhood 
Us(c) = (c — ô, c + 0) 
of c such that 


f (ANUs(c)) C Ve (F(e)). 


This statement just says that if |x — c| < ô and x € A, then |f(x) — f(c)| < e. It 
follows that 
ANUs(e) C f-"(V), 

meaning that f—~!(V) contains a relative neighborhood of c. Therefore f~'(V) is 
relatively open in A. 

Conversely, assume that f~'(V) is open in A for every open V in R, and let 
c € A. Then the preimage of the e-neighborhood (f(c) —«, f(c) +e) is open in A, so 
it contains a relative -neighborhood AN(c—6,c+06). It follows that | f(x)—f(c)| < e€ 
if |x — c| < 6 and x € A, which means that f is continuous at c. 


As one illustration of how we can use this result, we prove that continuous 
functions map intervals to intervals. (See Definition [5.62] for what we mean by an 
interval.) In view of Theorem|5.63} this is a special case of the fact that continuous 
functions map connected sets to connected sets (see Theorem |13.82). 


Theorem 7.32. Suppose that f : J — R is continuous and J C R is an interval. 
Then f(T) is an interval. 


Proof. Suppose that f(I) is not a interval. Then, by Theorem|5.63| f(T) is discon- 
nected, and there exist nonempty, disjoint open sets U, V such that UUV D f(J). 
Since f is continuous, f~'(U), f-1(V) are open from Theorem[7.31| Furthermore, 
f-l(U), f-1(V) are nonempty and disjoint, and fTt(U) U f-!(V) = T. It follows 
that I is disconnected and therefore J is not an interval from Theorem |5.63} This 
shows that if J is an interval, then f(I) is also an interval. 


We can also define open functions, or open mappings. Although they form an 
important class of functions, they aren’t as fundamental as continuous functions. 


Definition 7.33. An open mapping on a set A C R is a function f : A — R such 
that f(B) is open in R for every set B C A that is open in A. 


A continuous function needn’t be open, but if f : A > R is continuous and 
one-to-one, then f~! : f(A) > R is open. 


Example 7.34. Example shows that the square function f : R — R defined 
by f(x) = z? is not an open mapping on R. On the other hand f : [0,00) > R is 
open because it is one-to-one with a continuous inverse f~! : [0, o0) > R given by 


f(e) = Vz. 


7.5. Continuous functions on compact sets 131 


7.5. Continuous functions on compact sets 


Continuous functions on compact sets have especially nice properties. For example, 
they are bounded and attain their maximum and minimum values, and they are 
uniformly continuous. Since a closed, bounded interval is compact, these results 
apply, in particular, to continuous functions f : [a,b] > R. 

First, we prove that the continuous image of a compact set in R is compact. 
This is a special case of the fact that continuous functions map compact sets to 
compact sets (see Theorem |13.82). 


Theorem 7.35. If K C R is compact and f : K > R is continuous, then f(K) is 
compact. 


Proof. We will give two proofs, one using sequences and the other using open 
covers. 

We show that f(A) is sequentially compact. Let (yn) be a sequence in f(K). 
Then yn = f(x») for some x, € K. Since K is compact, the sequence (zn) has a 
convergent subsequence (x,,,) such that 


lim Tn; = £ 
l 7 
too 


where x € K. Since f is continuous on K, 


lim f (@p,) = f(z). 


1-00 
Writing y = f(x), we have y € f(K) and 


lim Yn, =y. 

1 CO 
Therefore every sequence (yn) in f(K) has a convergent subsequence whose limit 
belongs to f(K), so f(A) is compact. 


As an alternative proof, we show that f(A) has the Heine-Borel property. Sup- 
pose that {V; : i € I} is an open cover of f(A). Since f is continuous, Theorem|7.31] 
implies that f—'(V;) is open in K, so {f~'(V;) : i € I} is an open cover of K. Since 
K is compact, there is a finite subcover 


{F Vah F Via) (Vin) 
of K, and it follows that 
{Vin Vio, +++, Vin} 


is a finite subcover of the original open cover of f(K). This proves that f(K) is 
compact. 


Note that compactness is essential here; it is not true, in general, that a con- 
tinuous function maps closed sets to closed sets. 


Example 7.36. Define f : R > R by 
1 
=T 
Then [0,00) is closed but f ([0, o0)) = (0, 1] is not. 


132 7. Continuous Functions 


0.05 l \ | 


| il 
laf Z| Ll 


Figure 3. A plot of the function y = x + x sin(1/x) on [0,2/7] and a detail 
near the origin. 


The following result is one of the most important property of continuous func- 
tions on compact sets. 


Theorem 7.37 (Weierstrass extreme value). If f : K — R is continuous and 
K cC R is compact, then f is bounded on K and f attains its maximum and 
minimum values on K. 


Proof. The image f(A) is compact from Theorem [7.35] Proposition [5.40] implies 
that f(K) is bounded and the maximum M and minimum m belong to f(K). 
Therefore there are points x,y E€ K such that f(x) = M, f(y) = m, and f attains 
its maximum and minimum on K. 


Example 7.38. Define f : [0,1] > R by 


l/e if0<a<l, 
e ifr=0. 


Then f is unbounded on [0,1] and has no maximum value (f does, however, have 
a minimum value of 0 attained at z = 0). In this example, [0,1] is compact but f 
is discontinuous at 0, which shows that a discontinuous function on a compact set 
needn’t be bounded. 


Example 7.39. Define f : (0,1] —> R by f(x) = 1/x. Then f is unbounded on 
(0, 1] with no maximum value (f does, however, have a minimum value of 1 attained 
at x = 1). In this example, f is continuous but the half-open interval (0, 1] isn’t 
compact, which shows that a continuous function on a non-compact set needn’t be 
bounded. 


Example 7.40. Define f : (0,1) > R by f(x) =a. Then 
inf f(x) = 0, sup f(x) = 
xe(0,1) 


x€(0,1) 


but f(z) 40, f(x) # 1 for any 0 < x < 1. Thus, even if a continuous function on 
a non-compact set is bounded, it needn’t attain its supremum or infimum. 


7.6. The intermediate value theorem 133 


Example 7.41. Define f : [0,2/7] > R by 


f(z) = oe if0< z< 2/r, 


0 if z= 0. 
(See Figure [3}) Then f is continuous on the compact interval [0, 2/7], so by The- 
orem it attains its maximum and minimum. For 0 < x < 2/7, we have 
0 < f(x) < 1/r since |sin1/z| < 1. Thus, the minimum value of f is 0, attained 
at x = 0. It is also attained at infinitely many other interior points in the interval, 
= 1 
Tn = Ong + 37/2’ 


where sin(1/x,) = —1. The maximum value of f is 1/7, attained at x = 2/z. 


n=0,1,2,3,..., 


Finally, we prove that continuous functions on compact sets are uniformly 
continuous 


Theorem 7.42. If f : K — R is continuous and K C R is compact, then f is 


uniformly continuous on K. 


Proof. Suppose for contradiction that f is not uniformly continuous on K. Then 
from Proposition there exists €o > 0 and sequences (£n), (yn) in K such that 


lim |£n — Yn| = 0 and |f (an) — f(yn)| 2 €o for every n € N. 
noo 
Since K is compact, there is a convergent subsequence (2,,) of (£n) such that 
lim g,, =@ EK. 
I CO 
Moreover, since (£n — Yn) > 0 as n > œ, it follows that 
lim yn, = lim [£n; — (En; — Yn;)| = lim vp, — lim (£n; — Yn;) = T, 
i— oo i—oo 1—7 00 1—00 
so (Yn; ) also converges to x. Then, since f is continuous on K, 
lim |f(En,) — f (Yn: )| = | im f(n) — lim fn)! = |f (£) — f(z) = 0, 
i-oo i— oo i—oo 
but this contradicts the non-uniform continuity condition 


Therefore f is uniformly continuous. 


= €p. 


Example 7.43. The function f : [0,2/7] > R defined in Example|7.41ļis uniformly 
continuous on [0,2/7] since it is is continuous and [0,2/7] is compact. 


7.6. The intermediate value theorem 


The intermediate value theorem states that a continuous function on an interval 
takes on all values between any two of its values. We first prove a special case. 


Theorem 7.44 (Intermediate value). Suppose that f : [a,b] —> R is a continuous 
function on a closed, bounded interval. If f(a) < 0 and f(b) > 0, or f(a) > 0 and 
f(b) < 0, then there is a point a < c < b such that f(c) = 0. 


134 7. Continuous Functions 


Proof. Assume for definiteness that f(a) < 0 and f(b) > 0. (If f(a) > 0 and 
f(b) < 0, consider —f instead of f.) The set 


E = {x € [a,b] : f(a) < 0} 
is nonempty, since a € E, and E is bounded from above by b. Let 
c= sup E € [a,b], 
which exists by the completeness of R. We claim that f(c) = 0. 


Suppose for contradiction that f(c) 4 0. Since f is continuous at c, there exists 
ô > 0 such that 


|x — c| < ô and z € [a,b] implies that | f(x) — f(o)| < TO 
If f(c) < 0, then c # b and 


1 
F(x) = Fle) + F(x) — f(e) < Fle) — 5 Fe) 
for all x € [a,b] such that |x — c| < 5, so f(x) < f(c) < 0. It follows that there are 
points x € E with x > c, which contradicts the fact that c is an upper bound of E. 
If f(c) > 0, then c # a and 


f(x) = fle) + Fle) - FO > FA- 5 F(0) 


for all x € [a,b] such that |x — c| < ô, so f(x) > $f(c) > 0. It follows that there 
exists 7 > 0 such that c— n > a and 


f(a) >0forc-—n<aK<e. 
In that case, c— 7 < cis an upper bound for E, since c is an upper bound and 
f(z) > 0 for c— n < x < c, which contradicts the fact that c is the least upper 


bound. This proves that f(c) = 0. Finally, c 4 a,b since f is nonzero at the 
endpoints, soa<c< b. 


We give some examples to show that all of the hypotheses in this theorem are 
necessary. 


Example 7.45. Let K = [—2,—1] U [1,2] and define f : K > R by 
—1 if-2<2<-1 
A fI<r<2 


Then f(—2) < 0 and f(2) > 0, but f doesn’t vanish at any point in its domain. 
Thus, in general, Theorem fails if the domain of f is not a connected interval 
[a, b]. 


Example 7.46. Define f : [—1,1] > R by 
—1 if—-1<zrz<0 
r= |) if0<2<1 


Then f(—1) < 0 and f(1) > 0, but f doesn’t vanish at any point in its domain. 
Here, f is defined on an interval but it is discontinuous at 0. Thus, in general, 
Theorem fails for discontinuous functions. 


7.6. The intermediate value theorem 135 


As one immediate consequence of the intermediate value theorem, we show that 

the real numbers contain the square root of 2. 
Example 7.47. Define the continuous function f : [1,2] > R by 

f(z) =z" —2. 
Then f(1) < 0 and f(2) > 0, so Theorem implies that there exists 1 < c < 2 
such that c? = 2. Moreover, since x? — 2 is strictly increasing on [0, 00), there is a 
unique such positive number, and we have proved the existence of 2. 

We can get more accurate approximations to /2 by repeatedly bisecting the 
interval [1,2]. For example f(3/2) = 1/4 > 0 so 1 < V2 < 3/2, and f(5/4) < 0 
so 5/4 < V2 < 3/2, and so on. This bisection method is a simple, but useful, 
algorithm for computing numerical approximations of solutions of f(a) = 0 where 
f is a continuous function. 

Note that we used the existence of a supremum in the proof of Theorem If 
we restrict f(x) = x? —2 to rational numbers, f : A + Q where A = [1,2]NQ, then 
f is continuous on A, f(1) < 0 and f(2) > 0, but f(c) 40 for any c € A since v2 
is irrational. This shows that the completeness of R is essential for Theorem 
to hold. (Thus, in a sense, the theorem actually describes the completeness of the 
continuum R rather than the continuity of f!) 


The general statement of the Intermediate Value Theorem follows immediately 
from this special case. 


Theorem 7.48 (Intermediate value theorem). Suppose that f : [a,b] > R is 
a continuous function on a closed, bounded interval. Then for every d strictly 
between f(a) and f(b) there is a point a < c < b such that f(c) = d. 


Proof. Suppose, for definiteness, that f(a) < f(b) and f(a) < d < f(b). (If 
f(a) > f(b) and f(b) < d < f(a), apply the same proof to — f, and if f(a) = f(b) 
there is nothing to prove.) Let g(x) = f(a) —d. Then g(a) < 0 and g(b) > 0, so 
Theorem|7.44] implies that g(c) = 0 for some a < c < b, meaning that f(c) = d. 


As one consequence of our previous results, we prove that a continuous function 
maps compact intervals to compact intervals. 


Theorem 7.49. Suppose that f : [a,b] > R is a continuous function on a closed, 
bounded interval. Then f([a,b]) = [m, M] is a closed, bounded interval. 


Proof. Theorem |7.37|implies that m < f(x) < M for all x € [a,b], where m and 
M are the maximum and minimum values of f on [a,b], so f([a,b]) C [m, M]. 
Moreover, there are points c,d € [a,b] such that f(c) =m, f(d) = M. 

Let J = [c,d] if c < d or J = [d,c] if d < c. Then J C [a,b], and Theorem [7.48] 
implies that f takes on all values in [m, M] on J. It follows that f([a,b]) > [m, M], 
so f([a,b]) = [m, M]. 


First we give an example to illustrate the theorem. 
Example 7.50. Define f : [—1,1] —> R by 


f(a) =z- z’. 


136 7. Continuous Functions 


Then, using calculus to compute the maximum and minimum of f, we find that 
2 

3v3 

This example illustrates that f ([a,b]) # [ f(a), f(b)] unless f is increasing. 


f (1,1) = [-M, M], M= 


Next we give some examples to show that the continuity of f and the con- 
nectedness and compactness of the interval [a,b] are essential for Theorem to 
hold. 


Example 7.51. Let sgn : [—1,1] — R be the sign function defined in Example|6.8} 
Then f is a discontinuous function on a compact interval [—1,1], but the range 
f({[-1, 1]) = {—1,0, 1} consists of three isolated points and is not an interval. 


Example 7.52. In Example the function f : K — R is continuous on a 
compact set K but f(A) = {—1,1} consists of two isolated points and is not an 
interval. 


Example 7.53. The continuous function f : R —> R in Example maps the 
unbounded, closed interval [0, 00) to the half-open interval (0, 1]. 


The last example shows that a continuous function may map a closed but 
unbounded interval to an interval which isn’t closed (or open). Nevertheless, as 
shown in Theorem [7.32] a continuous function always maps intervals to intervals, 
although the intervals may be open, closed, half-open, bounded, or unbounded. 


7.7. Monotonic functions 


Monotonic functions have continuity properties that are not shared by general func- 

tions. 

Definition 7.54. Let J C R be an interval. A function f : J > R is increasing if 
f(a1) < f(a2) if z1, £2 € I and z1 < 2, 

strictly increasing if 
f(a1) < f(x2) if x1, £2 € I and z1 < 2, 


decreasing if 
f(a1) > f(x2) if x1, £2 € I and z1 < a, 


and strictly decreasing if 


f(x1) > f(z2) if z1, £2 € I and x < z2. 


An increasing or decreasing function is called a monotonic function, and a strictly 
increasing or strictly decreasing function is called a strictly monotonic function. 


A commonly used alternative (and, unfortunately, incompatible) terminology 
is “nondecreasing” for “increasing,” “increasing” for “strictly increasing,” “nonin- 
creasing” for “decreasing,” and “decreasing” for “strictly decreasing.” According to 
our terminology, a constant function is both increasing and decreasing. Monotonic 


functions are also referred to as monotone functions. 


7.7. Monotonic functions 137 


Theorem 7.55. If f : I 4 R is monotonic on an interval J, then the left and right 
limits of f, 


lim f(x), lim f(x), 


mc act 


exist at every interior point c of I. 


Proof. Assume for definiteness that f is increasing. (If f is decreasing, we can 
apply the same argument to —f which is increasing). We will prove that 


lim f(x) = sup E, E=4{f(x)ER:xE€landzr< c}. 
P a A 


The set E is nonempty since c in an interior point of J, so there exists x € I 
with z < c, and E bounded from above by f(c) since f is increasing. It follows 
that L = sup E € R exists. (Note that L may be strictly less than f(c)!) 

Suppose that e > 0 is given. Since L is a least upper bound of E, there exists 
yo € E such that L — e€ < yo < L, and therefore x E€ I with xp < c such that 
f(ao) = yo. Let ô = c — zo > 0. Ifce— ô< x< c, then 2g < x < cand therefore 
f(zo) < f(x) < L since f is increasing and L is an upper bound of E. It follows 
that 

L—e< f(x) <L ifce-éd<a<e, 
which proves that lim,_,,- f(a) = L. 

A similar argument, or the same argument applied to g(x) = —f(—x), shows 

that 
lim, f(x) =inf{f(z) Ee R:x«e€landzr>c}. 
«wc 


We leave the details as an exercise. 


Similarly, if J = (a, b] has right-endpoint b € I and f is monotonic on J, then 
the left limit lim,_,,- f(x) exists, although it may not equal f(b), and if a € I is a 
left-endpoint, then the right limit lim,_,,+ f(x) exists, although it may not equal 


f(a). 


Corollary 7.56. Every discontinuity of a monotonic function f : J —> R at an 
interior point of the interval I is a jump discontinuity. 


Proof. If cis an interior point of J, then the left and right limits of f at c exist by 
the previous theorem. Moreover, assuming for definiteness that f is increasing, we 
have 

f(a) < fle) < fly) for all x,y € I with z < c < y, 


and since limits preserve inequalities 
lim f(x) < f(c) < lim f(x). 
£> cT act 
If the left and right limits are equal, then the limit exists and is equal to the left 
and right limits, so 
lim f(x) = f(C), 
meaning that f is continuous at c. In particular, a monotonic function cannot have 


a removable discontinuity at an interior point of its domain (although it can have 
one at an endpoint of a closed interval). If the left and right limits are not equal, 


138 7. Continuous Functions 


then f has a jump discontinuity at c, so f cannot have an essential discontinuity 
either. 


One can show that a monotonic function has, at most, a countable number 
of discontinuities, and it may have a countably infinite number, but we omit the 
proof. By contrast, the non-monotonic Dirichlet function has uncountably many 
discontinuities at every point of R. 


Chapter 8 


Differentiable Functions 


A differentiable function is a function that can be approximated locally by a linear 
function. 


8.1. The derivative 


Definition 8.1. Suppose that f : (a,b) + Randa < c < b. Then f is differentiable 
at c with derivative f’(c) if 

cd | FAC) SFO y 
es oe = 110): 

The domain of f’ is the set of points c € (a,b) for which this limit exists. If the 
limit exists for every c € (a,b) then we say that f is differentiable on (a,b). 


Graphically, this definition says that the derivative of f at c is the slope of the 
tangent line to y = f(a) at c, which is the limit as h — 0 of the slopes of the lines 
through (c, f(c)) and (c +h, f(e+h)). 


We can also write 
F(x) -f a 


L—C 


f'(c) = lim | 


Te 


since if x = c + h, the conditions 0 < |r — c| < 6 and 0 < |h| < 6 in the definitions 
of the limits are equivalent. The ratio 


fæ) -= f© 
xL-—C 
is undefined (0/0) at x = c, but it doesn’t have to be defined in order for the limit 
as x — c to exist. 


Like continuity, differentiability is a local property. That is, the differentiability 
of a function f at c and the value of the derivative, if it exists, depend only the 
values of f in a arbitrarily small neighborhood of c. In particular if f : A => R 


139 


140 8. Differentiable Functions 


where A C R, then we can define the differentiability of f at any interior point 
c € A since there is an open interval (a,b) C A with c € (a,b). 


8.1.1. Examples of derivatives. Let us give a number of examples that illus- 
trate differentiable and non-differentiable functions. 


Example 8.2. The function f : R — R defined by f(x) = 2? is differentiable on 
R with derivative f'(x) = 2a since 


22 
jae = lim A(2e+ h) _ lim (2c + h) = 2c. 


h-0 h h-0 


Note that in computing the derivative, we first cancel by h, which is valid since 
h Æ 0 in the definition of the limit, and then set h = 0 to evaluate the limit. This 
procedure would be inconsistent if we didn’t use limits. 


Example 8.3. The function f : R —> R defined by 


x? ifx>0, 
0 ifa<0. 


2x ifx#>O0 
1 = , 
ro-hi ife <0. 


For x > 0, the derivative is f'(x) = 2a” as above, and for x < 0, we have f'(x) = 0. 
For 0, we consider the limit 


im Eai are A 
h0 h hao h 
The right limit is 
lim flr) E h=0, 
h—-0t h-0 
and the left limit is 
lim Fh) 0 
h-0- h 


Since the left and right limits exist and are equal, the limit also exists, and f is 
differentiable at 0 with f’(0) = 0. 


Next, we consider some examples of non-differentiability at discontinuities, cor- 
ners, and cusps. 


Example 8.4. The function f : R —> R defined by 


_ J1l/x ifs £0, 
w= if £= 0, 


8.1. The derivative 141 


is differentiable at x ¢ 0 with derivative f'(x) = —1/? since 
ii ac h) - AO) ee a h) — wie] 
h>0 h h-0 h 


= lim |S ory) 
~ ho} he(e +h) 
= — lim ———~ 
nso c(c+ h) 
1 
= =a" 
However, f is not differentiable at 0 since the limit 
h) - 1/h — 1 
[2H] = ty YA=2] = 5 


= lim = lim — 
h0 h ho h? 


h-0 


does not exist. 


Example 8.5. The sign function f(x) = sgn, defined in Example [6.8] is differ- 
entiable at x # 0 with f'(x) = 0, since in that case f(a +h) — f(x) = 0 for all 
sufficiently small h. The sign function is not differentiable at 0 since 


eee i senh 


= hm 
h>0 h 


and 


ssnh_ Jl/h_ ifh>0 
h )-1/h ifh <0 
is unbounded in every neighborhood of 0, so its limit does not exist. 


Example 8.6. The absolute value function f(a) = |x| is differentiable at x 4 0 
with derivative f'(x) = sgn z. It is not differentiable at 0, however, since 
h)— h 
jim ODAO _ pig MA 


= lim Hu = lim sgnh 
h-0 h h-0 h-0 


does not exist. (The right limit is 1 and the left limit is —1.) 


Example 8.7. The function f : R + R defined by f(x) = |x|'/? is differentiable 
at x £0 with 

ry Syne 
If c > 0, then using the difference of two square to rationalize the numerator, we 
get 


1/2 _ 1/2 
np Oe 
h—0 h 

(c+h)-c 
1m 
h>0 h |(c + h)!/2 + cl/?] 
— li 1 
-ao (e+ h)!/2 + ct/2 
1 
Qe1/2° 


142 8. Differentiable Functions 


If c < 0, we get the analogous result with a negative sign. However, f is not 
differentiable at 0, since 


Ree A) bane 4) See 
Rare h 7 a hi/2 
does not exist. 
Example 8.8. The function f : R > R defined by f(x) = x13 is differentiable at 
x #0 with 
1 
Fizz 
f (x) az 32/3 
To prove this result, we use the identity for the difference of cubes, 
aè — b? = (a — b)(a? + ab + b°), 
and get for c Æ 0 that 


mol M 
i (c+h)—c 
= lim 
h10 h [(e-+ h)? + (c +h) + 23] 
ai 1 
= nan (c+ h)2/3 En (c+ h)1/3c173 evs 
= 1 
~ 3¢2/3° 


However, f is not differentiable at 0, since 
ee i esa A) a n 1 
a 
does not exist. 
Finally, we consider some examples of highly oscillatory functions. 
Example 8.9. Define f : R —> R by 
xsin(l/x) if «40, 
Or aie ea 
0 ifa=0. 


It follows from the product and chain rules proved below that f is differentiable at 
x #0 with derivative 


1 1 
f'(x) = sin = — = cos —. 
t £ x 
However, f is not differentiable at 0, since 
h)— 
lim f(r) = fO) = lim sin T 


h—=>0 h—-0 


which does not exist. 


Example 8.10. Define f : R > R by 


= Jæ? sin(1/z) ate #0, 
wi if z= 0. 


8.1. The derivative 143 


-0.05 0 0.05 0.1 


Figure 1. A plot of the function y = x? sin(1/x) and a detail near the origin 
with the parabolas y = +a? shown in red. 


Then f is differentiable on R. (See Figure[1}) It follows from the product and chain 
rules proved below that f is differentiable at x # 0 with derivative 


1 1 
f'(x) = 2a sin — — cos —. 
x £ 


Moreover, f is differentiable at 0 with f’(0) = 0, since 


lim F(R) = £0) = lim oe = 0. 
h—>0 h—>0 h 

In this example, lim,-,9 f'(x) does not exist, so although f is differentiable on R, 
its derivative f’ is not continuous at 0. 


8.1.2. Derivatives as linear approximations. Another way to view Defini- 
tion is to write 


Fle h) = f(e) + f'(e)h + r(h) 
as the sum of a linear (or, strictly speaking, affine) approximation f(c) + f’(c)h of 


f(e +h) and a remainder r(h). In general, the remainder also depends on c, but 
we don’t show this explicitly since we’re regarding c as fixed. 


As we prove in the following proposition, the differentiability of f at c is equiv- 
alent to the condition 
_ r(h) 
lim 


a h 9 


That is, the remainder r(h) approaches 0 faster than h, so the linear terms in h 
provide a leading order approximation to f(c + h) when h is small. We also write 
this condition on the remainder as 


r(h) = o(h) as h > 0, 
pronounced “r is little-oh of h as h > 0.” 


Graphically, this condition means that the graph of f near c is close the line 
through the point (c, f(c)) with slope f’(c). Analytically, it means that the function 


he f(e+h) - fle) 


144 8. Differentiable Functions 


is approximated near c by the linear function 
hes f'(e)h. 


Thus, f’(c) may be interpreted as a scaling factor by which a differentiable function 
f shrinks or stretches lengths near c. 


If |f (c)| < 1, then f shrinks the length of a small interval about c by (ap- 
proximately) this factor; if |f’(c)| > 1, then f stretches the length of an interval 
by (approximately) this factor; if f’(c) > 0, then f preserves the orientation of 
the interval, meaning that it maps the left endpoint to the left endpoint of the 
image and the right endpoint to the right endpoints; if f’(c) < 0, then f reverses 
the orientation of the interval, meaning that it maps the left endpoint to the right 
endpoint of the image and visa-versa. 


We can use this description as a definition of the derivative. 
Proposition 8.11. Suppose that f : (a,b) ~ R. Then f is differentiable at c € 


(a,b) if and only if there exists a constant A € R and a function r : (a—c,b—c) > R 
such that 


f(et+th) = f() + Ah+r(h), lim 
In that case, A = f’(c). 


Proof. First suppose that f is differentiable at c according to Definition [8.1] and 
define 


r(h) = f(c+h) = fle) — f'(oh. 
Then 


am) = tim EEDE _ 
pa = pa ro] =o 


so the condition in the proposition holds with A = f’(c). 
Conversely, suppose that f(c +h) = f(c) + Ah +r(h) where r(h)/h > 0 as 

h => 0. Then 

lim a = A = lim ja + = =A 

h h 

so f is differentiable at c with f’(c) = A. 


Example 8.12. For Example|8.2] with f(x) = 2”, we get 
(e+h)? =e +2ch+ h?, 
and r(h) = h?, which goes to zero at a quadratic rate as h > 0. 
Example 8.13. For Example [8.4| with f(x) = 1/x, we get 
1 1 1 
c+4h c æ 
for c # 0, where the quadratically small remainder is 
h2 
ele+h) 


r(h) = 


8.2. Properties of the derivative 145 


8.1.3. Left and right derivatives. For the most part, we will use derivatives 
that are defined only at the interior points of the domain of a function. Sometimes, 
however, it is convenient to use one-sided left or right derivatives that are defined 
at the endpoint of an interval. 


Definition 8.14. Suppose that f : [a,b] > R. Then f is right-differentiable at 
a < c< b with right derivative f’(ct) if 


i, [fer = fle) 


hot 


= f'(c*) 
exists, and f is left-differentiable at a < c < b with left derivative f’(c”) if 


dig. [PAA 2S] = ip, [AE = ©. 


A function is differentiable at a < c < b if and only if the left and right 
derivatives at c both exist and are equal. 
Example 8.15. If f : [0,1] + R is defined by f(x) = x”, then 
fOr)=0, flr) =2. 


These left and right derivatives remain the same if f is extended to a function 
defined on a larger domain, say 


z? if0<a<1, 
f(z)=41 ifa>1, 
1/x ifu<0. 


For this extended function we have f’(1*) = 0, which is not equal to f’(1~), and 
f'(07) does not exist, so the extended function is not differentiable at either 0 or 
1. 


Example 8.16. The absolute value function f(x) = |x| in Example [8.6] is left and 
right differentiable at 0 with left and right derivatives 


p0t)=1, f-)=-2. 


These are not equal, and f is not differentiable at 0. 


8.2. Properties of the derivative 
In this section, we prove some basic properties of differentiable functions. 

8.2.1. Differentiability and continuity. First we discuss the relation between 
differentiability and continuity. 


Theorem 8.17. If f : (a,b) > R is differentiable at at c € (a,b), then f is 
continuous at c. 


146 8. Differentiable Functions 


Proof. If f is differentiable at c, then 


nn [fet -F l 
= jin [APO imh 
= fi(c)-0 

0, 


which implies that f is continuous at c. 


For example, the sign function in Example has a jump discontinuity at 0 
so it cannot be differentiable at 0. The converse does not hold, and a continuous 
function needn’t be differentiable. The functions in Examples are 
continuous but not differentiable at 0. Example [9.24] describes a function that is 
continuous on R but not differentiable anywhere. 


In Example[8.10| the function is differentiable on R, but the derivative f’ is not 
continuous at 0. Thus, while a function f has to be continuous to be differentiable, 
if f is differentiable its derivative f’ need not be continuous. This leads to the 
following definition. 


Definition 8.18. A function f : (a,b) > R is continuously differentiable on (a, b), 
written f € C1(a,b), if it is differentiable on (a,b) and f’ : (a,b) — R is continuous. 


For example, the function f(x) = x? with derivative f'(x) = 2x is continu- 
ously differentiable on R, whereas the function in Example[8.10]is not continuously 
differentiable at 0. As this example illustrates, functions that are differentiable 
but not continuously differentiable may behave in rather pathological ways. On 
the other hand, the behavior of continuously differentiable functions, whose graphs 
have continuously varying tangent lines, is more-or-less consistent with what one 
expects. 


8.2.2. Algebraic properties of the derivative. A fundamental property of the 
derivative is that it is a linear operation. In addition, we have the following product 
and quotient rules. 


Theorem 8.19. If f,g: (a,b) > R are differentiable at c € (a,b) and k € R, then 
kf, f +g, and fg are differentiable at c with 


(KEFY (E) = kf), (F+9O=fO+9(), (Fg) (e) = f(a + fog’): 


Furthermore, if g(c) £ 0, then f/g is differentiable at c with 


8.3. The chain rule 147 


Proof. The first two properties follow immediately from the linearity of limits 
stated in Theorem For the product rule, we write 


oe = jim, | He Mate + = Fleet) 
= j [CEER = LO) a+ EAO e+ 1) O] 
h-0 h 
= f'(e)g(c) + f(c)g' (c), 


where we have used the properties of limits in Theorem and Theorem 
which implies that g is continuous at c. The quotient rule follows by a similar 
argument, or by combining the product rule with the chain rule, which implies that 


(1/9)! = —g'/9°. (See Example below.) 


Example 8.20. We have 1’ = 0 and z’ = 1. Repeated application of the product 
rule implies that x” is differentiable on R for every n € N with 


(£Y any". 


Alternatively, we can prove this result by induction: The formula holds for n = 1. 
Assuming that it holds for some n € N, we get from the product rule that 


Gry Z (a . a”)! =1- z" +r- nar = (n+ 1)a”, 


and the result follows. It also follows by linearity that every polynomial function 
is differentiable on R, and from the quotient rule that every rational function is 
differentiable at every point where its denominator is nonzero. The derivatives are 
given by their usual formulae. 


8.3. The chain rule 


The chain rule states that the composition of differentiable functions is differen- 
tiable. The result is quite natural if one thinks in terms of derivatives as linear 
maps. If f is differentiable at c, it scales lengths by a factor f’(c), and if g is 
differentiable at f(c), it scales lengths by a factor g’ (f(c)). Thus, the composition 
go f scales lengths at c by a factor g’ (f(c))- f’(c). Equivalently, the derivative of 
a composition is the composition of the derivatives (regarded as linear maps). 


We will prove the chain rule by showing that the composition of remainder 
terms in the linear approximations of f and g leads to a similar remainder term in 
the linear approximation of go f. The argument is complicated by the fact that we 
have to evaluate the remainder of g at a point that depends on the remainder of f, 
but this complication should not obscure the simplicity of the final result. 


Theorem 8.21 (Chain rule). Let f : A —> R and g : B —> R where A C R and 
f (A) C B, and suppose that c is an interior point of A and f(c) is an interior point 
of B. If f is differentiable at c and g is differentiable at f(c), then go f : A > R is 
differentiable at c and 


(go FY) =g (FO) fC: 


148 8. Differentiable Functions 


Proof. Since f is differentiable at c, there is a function r(h) such that 


Feth) = fle) + f'(e)h+r(h), lim — 


and since g is differentiable at f(c), there is a function s(k) such that 


IOIO) Oks), im © <o, 
It follows that 
(go f)le+h) =g (FO + Fh + r(h)) 
=I) +9 O) OREA) +s hr) 
=o) +4 FOPO hH) 


where 
t(h) =g' (F(c)) -r(h) +s ((h)), (h) = f'(c)h +r(h). 
Since r(h)/h > 0 as h —> 0, we have 
ma a A, 
hoo h h>0 OA 
We claim that this limit exists and is zero, and then it follows from Proposition 
that g o f is differentiable at c with 
(9° fy'(e) = g' (Fe) f(c). 
To prove the claim, we use the facts that 


o(h) s(k) 


suudi as h > 0, A as k —> 0. 
Roughly speaking, we have ¢(h) ~ f'(c)h when h is small and therefore 
h '(c)h 
TCO D 
h h 
In detail, let € > 0 be given. We want to show that there exists ô > 0 such that 
h 
e if 0 < |h| < ô. 
First, choose 6; > 0 such that 
h 
Wro if 0 < |h| < 64. 


If 0 < |h] < ô, then 
|o(h)| < IF COA] + Ir(A)| 
< IFO + (C)+ IR] 
< (2| f"(c)| + 1) |Al. 


Next, choose 7 > 0 so that 
s(k) | € 


if k ; 
i <a Fo) 41 if0< |k| <n 


8.3. The chain rule 149 


(We include a “1” in the denominator on the right-hand side to avoid a division by 
zero if f’(c) = 0.) Finally, define d2 > 0 by 
1) 
ð = — ~, 
2+1 
and let 6 = min(ô1, 62) > 0. 
If 0 < |h| < 6 and (h) 4 0, then 0 < |ọ(h)| < n, so 


€lo(h)| 
< h]. 
SCONES aaa 
If (h) = 0, then s(¢(h)) = 0, so the inequality holds in that case also. This proves 
that 
van D _ 
h>0 OA 


Example 8.22. Suppose that f is differentiable at c and f(c) 4 0. Then g(y) = 1/y 
is differentiable at f(c), with g/(y) = —1/y? (see Example|8.4). It follows that the 
reciprocal function 1/f = go f is differentiable at c with 


1y / / O) 
=} e =g (J) e) =-= : 
G) o=sroro--39 
The chain rule gives an expression for the derivative of an inverse function. In 
terms of linear approximations, it states that if f scales lengths at c by a nonzero 
factor f’(c), then f7t scales lengths at f(c) by the factor 1/f’(c). 


Proposition 8.23. Suppose that f : A — R is a one-to-one function on A C R 
with inverse f~! : B + R where B = f (A). Assume that f is differentiable at an 
interior point c € A and f~t is differentiable at f(c), where f(c) is an interior point 
of B. Then f’(c) #0 and 


1 
TY GO) = . 
f(c) 
Proof. The definition of the inverse implies that 


F (F(a) =. 
Since f is differentiable at c and f~! is differentiable at f(c), the chain rule implies 
that 
-< / 
(fF) FO)F(O=1 
Dividing this equation by f’(c) 4 0, we get the result. Moreover, it follows that 
f=! cannot be differentiable at f(c) if f’(c) = 0. 


Alternatively, setting d = f(c), we can write the result as 


1 
Yd) = aay: 
f(f-"@) 
Proposition is not entirely satisfactory because it assumes the existence 
and differentiability of an inverse function. We will return to this question in 
Section [8.7] below, but we end this section with some examples that illustrate the 


150 8. Differentiable Functions 


necessity of the condition f’(c) 4 0 for the existence and differentiability of the 
inverse. 


Example 8.24. Define f : R > R by f(x) = x”. Then f’(0) = 0 and f is not 
invertible on any neighborhood of the origin, since it is non-monotone and not one- 
to-one. On the other hand, if f : (0,00) + (0,00) is defined by f(x) = x”, then 
f'(x) = 2x #0 and the inverse function f~! : (0,00) — (0,00) is given by 


f(y) = V¥- 
The formula for the inverse of the derivative gives 
1 1 
—1yr7,.2) _ sg 
or, writing x = f—'(y), 
1 
YU) = z 
2/y 


in agreement with Example 8.7] 
Example 8.25. Define f : R —> R by f(x) = 2°. Then f is strictly increasing, 


one-to-one, and onto. The inverse function f7} : R > R is given by 

fy) ay. 
Then f’(0) = 0 and ft is not differentiable at f(0)= 0. On the other hand, f~t 
is differentiable at non-zero points of R, with 


or, writing « = y/, 


in agreement with Example |8.8} 


8.4. Extreme values 


One of the most useful applications of the derivative is in locating the maxima and 
minima of functions. 


Definition 8.26. Suppose that f : A — R. Then f has a global (or absolute) 
maximum at c € A if 
f(x) < flo) for all x € A, 
and f has a local (or relative) maximum at c € A if there is a neighborhood U of 
c such that 
f(x) < f(c) for all x € ANU. 


Similarly, f has a global (or absolute) minimum at c € A if 
f(x) => flo) for all x € A, 


and f has a local (or relative) minimum at c € A if there is a neighborhood U of c 
such that 
f(x) => flo) for alla € ANU. 


8.4. Extreme values 151 


If f has a (local or global) maximum or minimum at c € A, then f is said to have 
a (local or global) extreme value at c. 


Theorem states that a continuous function on a compact set has a global 
maximum and minimum but does not say how to find them. The following funda- 
mental result goes back to Fermat. 


Theorem 8.27. If f: A C R > R has a local extreme value at an interior point 
c € A and f is differentiable at c, then f’(c) = 0. 


Proof. If f has a local maximum at c, then f(x) < f(c) for all x in a 6-neighborhood 
(c — 6,c + ô) of c, so 
e 19) <9 for allO <i <6, 


which implies that 


(= im, [EEA] co 


h—-0+ 
Moreover, 
E Ko >0 forall -5 <h <0, 
which implies that 
_ [| f(e+h) — fle) 
‘fey =O 
ro= jig [A - 


It follows that f’(c) = 0. If f has a local minimum at c, then the signs in these 
inequalities are reversed, and we also conclude that f’(c) = 0. 


For this result to hold, it is crucial that c is an interior point, since we look at 
the sign of the difference quotient of f on both sides of c. At an endpoint, we get 
the following inequality condition on the derivative. (Draw a graph!) 


Proposition 8.28. Let f : [a,b] > R. If the right derivative of f exists at a, then: 
f'(at) < Oif f has a local maximum at a; and f’(at) > 0 if f has a local minimum 
at a. Similarly, if the left derivative of f exists at b, then: f’(b~) > 0 if f has a 
local maximum at b; and f’(b~) < 0 if f has a local minimum at b. 


Proof. If the right derivative of f exists at a, and f has a local maximum at a, 
then there exists 6 > 0 such that f(x) < f(a) for a < x < a+ ĝ, so 


/ _ | flat+h) — f(a) 
form oy, [= o 


Similarly, if the left derivative of f exists at b, and f has a local maximum at b, 
then f(x) < f(b) for b—6<a<b,so f'(b-) > 0. The signs are reversed for local 
minima at the endpoints. 


In searching for extreme values of a function, it is convenient to introduce the 
following classification of points in the domain of the function. 


152 8. Differentiable Functions 


Definition 8.29. Suppose that f : A C R > R. An interior point c € A such that 
f is not differentiable at c or f’(c) = 0 is called a critical point of f. An interior 
point where f’(c) = 0 is called a stationary point of f. 


Theorem limits the search for maxima or minima of a function f on A to 
the following points. 


(1) Boundary points of A. 

(2) Critical points of f: 
(a) interior points where f is not differentiable; 
(b) stationary points where f’(c) = 0. 


Additional tests are required to determine which of these points gives local or global 
extreme values of f. In particular, a function need not attain an extreme value at 
a critical point. 


Example 8.30. If f : [—1,1] — R is the function 


t= if -1<2<0, 


2x if0<2x<1, 


then x = 0 is a critical point since f is not differentiable at 0, but f does not attain 
a local extreme value at 0. The global maximum and minimum of f are attained 
at the endpoints x = 1 and x = —1, respectively, and f has no other local extreme 
values. 


Example 8.31. If f : [-1,1] > R is the function f(x) = x3, then x = 0 is a critical 
point since f’(0) = 0, but f does not attain a local extreme value at 0. The global 
maximum and minimum of f are attained at the endpoints x = 1 and z = —1, 
respectively, and f has no other local extreme values. 


8.5. The mean value theorem 


The mean value theorem is a key result that connects the global behavior of a 
function f : [a,b] > R, described by the difference f(b) — f(a), to its local behavior, 
described by the derivative f’ : (a,b) + R. We begin by proving a special case. 


Theorem 8.32 (Rolle). Suppose that f : [a,b] — R is continuous on the closed, 
bounded interval [a,b], differentiable on the open interval (a,b), and f(a) = f(b). 
Then there exists a < c < b such that f’(c) = 0. 


Proof. By the Weierstrass extreme value theorem, Theorem f attains its 
global maximum and minimum values on [a,b]. If these are both attained at the 
endpoints, then f is constant, and f’(c) = 0 for every a < c < b. Otherwise, f 
attains at least one of its global maximum or minimum values at an interior point 
a<c<b. Theorem [8.27] implies that f’(c) =0. 


Note that we require continuity on the closed interval [a,b] but differentiability 
only on the open interval (a,b). This proof is deceptively simple, but the result 
is not trivial. It relies on the extreme value theorem, which in turn relies on the 
completeness of R. The theorem would not be true if we restricted attention to 
functions defined on the rationals Q. 


8.5. The mean value theorem 153 


The mean value theorem is an immediate consequence of Rolle’s theorem: for 
a general function f with f(a) Æ f(b), we subtract off a linear function to make 
the values of the resulting function equal at the endpoints. 


Theorem 8.33 (Mean value). Suppose that f : [a,b] —> R is continuous on the 
closed, bounded interval [a,b] and differentiable on the open interval (a,b). Then 
there exists a < c < b such that 


Proof. The function g : [a,b] > R defined by 


ate) = se) = Ho- [EO] (ea) 


is continuous on [a,b] and differentiable on (a,b) with 


gie) = ro- -t0 


Moreover, g(a) = g(b) = 0. Rolle’s Theorem implies that there exists a < c < b 
such that g'(c) = 0, which proves the result. 


Graphically, this result says that there is point a < c < b at which the slope of 
the tangent line to the graph y = f(x) is equal to the slope of the chord between 
the endpoints (a, f(a)) and (b, f(b)). 


As a first application, we prove a converse to the obvious fact that the derivative 
of a constant functions is zero. 


Theorem 8.34. If f : (a,b) — R is differentiable on (a,b) and f'(x) = 0 for every 
a< zx < b, then f is constant on (a,b). 


Proof. Fix zo € (a,b). The mean value theorem implies that for all x € (a,b) with 


z Æ x 
f(x) = f (zo) 


ro = 2A 


for some c between zo and x. Since f’(c) = 0, it follows that f(x) = f(xo) for all 
x € (a,b), meaning that f is constant on (a,b). 


Corollary 8.35. If f,g : (a,b) > R are differentiable on (a,b) and f'(x) = g'(x) 
for every a < x < b, then f(x) = g(x) +C for some constant C. 


Proof. This follows from the previous theorem since (f — gy = 0. 


We can also use the mean value theorem to relate the monotonicity of a dif- 
ferentiable function with the sign of its derivative. (See Definition for our 
terminology for increasing and decreasing functions.) 


Theorem 8.36. Suppose that f : (a,b) > R is differentiable on (a,b). Then f is 
increasing if and only if f'(x) > 0 for every a < x < b, and decreasing if and only 
if f'(x) < 0 for every a < x < b. Furthermore, if f'(x) > 0 for every a < x < b 
then f is strictly increasing, and if f'(x) < 0 for every a < x < b then f is strictly 
decreasing. 


154 8. Differentiable Functions 


Proof. If f is increasing and a < x < b, then 
fle+h)-F(0) , 4 
h 2 
for all sufficiently small h (positive or negative), so 


Pe) = py [Ft =F) 


> 0. 
h-0 


Conversely if f’ > 0 and a < x < y < b, then by the mean value theorem there 
exists x < c < y such that 


f) = (2) 
y-zr 
which implies that f(x) < f(y), so f is increasing. Moreover, if f’(c) > 0, we get 
f(x) < f(y), so f is strictly increasing. 


= f"()>0, 


The results for a decreasing function f follow in a similar way, or we can apply 
of the previous results to the increasing function — f. 


Note that although f’ > 0 implies that f is strictly increasing, f is strictly 
increasing does not imply that f’ > 0. 


Example 8.37. The function f : R > R defined by f(x) = x? is strictly increasing 
on R, but f’(0) = 0. 


If f is continuously differentiable and f’(c) > 0, then f'(x) > 0 for all z ina 
neighborhood of c and Theorem implies that f is strictly increasing near c. 
This conclusion may fail if f is not continuously differentiable at c. 


Example 8.38. Define f : R > R by 
x/2 + xz? sin(1/x) if x £0, 
lo) = 47! a T 
0 if x = 0. 
Then f is differentiable on R with 
fe) = 1/2 — cos(1/x) + 2x sin(1/x) if x 40, 
~ 1/2 if £ =0. 
Every neighborhood of 0 includes intervals where f’ < 0 or f’ > 0, in which f is 
strictly decreasing or strictly increasing, respectively. Thus, despite the fact that 


f'(0) > 0, the function f is not strictly increasing in any neighborhood of 0. As a 
result, no local inverse of the function f exists on any neighborhood of 0. 


8.6. Taylor’s theorem 


If f : (a,b) > R is differentiable on (a,b) and f’ : (a,b) > R is differentiable, then 
we define the second derivative f” : (a,b) > R of f as the derivative of f’. We 
define higher-order derivatives similarly. If f has derivatives f™ : (a,b) > R of all 
orders n € N, then we say that f is infinitely differentiable on (a,b). 

Taylor’s theorem gives an approximation for an (n + 1)-times differentiable 
function in terms of its Taylor polynomial of degree n. 


8.6. Taylor’s theorem 155 


Definition 8.39. Let f : (a,b) —> R and suppose that f has n derivatives 
Pref DPR 


on (a,b). The Taylor polynomial of degree n of f at a < c < bis 
1 1 
Pla) = fl) + F(a — 0) + FH (Oe— 6)? + OAOE- o. 


Equivalently, 


P(x) = X a(z —¢)*, ak = af). 


We call a, the kth Taylor coefficient of f at c. The computation of the Taylor 
polynomials in the following examples are left as an exercise. 
Example 8.40. If P(x) is a polynomial of degree n, then P,(x) = P(x). 


Example 8.41. The Taylor polynomial of degree n of e” at x = 0 is 


1 1. 
P,(z) =1+24 oe Fe". 
Example 8.42. The Taylor polynomial of degree 2n of cosx at x = 0 is 
Je 1 2) 1 4 f n 1 2n 
Paa de a CO oe 


We also have Pon+1 = Po, since the Tayor coefficients of odd order are zero. 
Example 8.43. The Taylor polynomial of degree 2n + 1 of sin x at x = 0 is 
1 ; 1 

fae 1)” 
ý POU Gari) 


Poyyi (a) =z gel 


We also have Pon42 = Pont. 
Example 8.44. The Taylor polynomial of degree n of 1/x at x = 1 is 
P,(z) = 1— (2 —1) + (2 — 1)? — --+ + (-1)"(a — 1)”. 


Example 8.45. The Taylor polynomial of degree n of logx at x = 1 is 


P,(«) = (x —1) Ey aes 


We write 
f(x) = Pr(x) + Rn(a). 


where Rp is the error, or remainder, between f and its Taylor polynomial P,,. The 
next theorem is one version of Taylor’s theorem, which gives an expression for the 
remainder due to Lagrange. It can be regarded as a generalization of the mean 
value theorem, which corresponds to the case n = 1. The idea of the proof is to 
subtract a suitable polynomial from the function and apply Rolle’s theorem, just 
as we proved the mean value theorem by subtracting a suitable linear function. 


156 8. Differentiable Functions 


Theorem 8.46 (Taylor with Lagrange Remainder). Suppose that f : (a,b) > R 
has n + 1 derivatives on (a,b) and let a < c < b. For every a < x < b, there exists 
€ between c and x such that 


Fle) =F + FOE- 9) + Oe- Ho + AEE- "+ Rala) 


where 


Rule) = Gaal Ole - or" 
Proof. Fix z,c € (a,b). For t € (a,b), let 
a(t) = f(a) - #0) - rA- - FF" Ole — 7 —-- Oe. 
Then g(x) = 0 and 
f(t) =F Qe-H 
Define aay 
nit) = att) - (Z=) ato 


Then h(c) = h(x) = 0, so by Rolle’s theorem, there exists a point € between c and 
x such that h’(€) = 0, which implies that 


It follows from the expression for g’ that 


Tf Owes" = (nt Ne a9) 


and using the expression for g in this equation, we get the result. 


Note that the remainder term 


1 
Rn eer (n+1) — ,)\nti1 
@) = pal Oe- 
has the same form as the (n + 1)th-term in the Taylor polynomial of f, except that 
the derivative is evaluated at a (typically unknown) intermediate point € between 
c and zxz, instead of at c. 


Example 8.47. Let us prove that 


By Taylor’s theorem, 


1 1 
cosx = 1 — xt + qi (cos é) x" 
for some € between 0 and x. It follows that for a # 0, 


l— cosx 1 1 
F a (cos €)x. 
Since | cos | < 1, we get 
[‘— i 1 
<-—2 


8.7. * The inverse function theorem 157 


which implies that 


Note that as well as proving the limit, Taylor’s theorem gives an explicit upper 
bound for the difference between (1 — cos z)/a? and its limit 1/2. For example, 


1—cos(0.1) 1 1 
(0.1)? 2| = 2400° 
Numerically, we have 
1 1-— cos(0.1) 1 
~ — ———_~— & 000041653 —— 7 0.00041667. 
2 (0.1)? , 2400 


In Section we derive an alternative expression for the remainder Rn as an 
integral. 


8.7. * The inverse function theorem 


The inverse function theorem gives a sufficient condition for a differentiable function 
f to be locally invertible at a point c with differentiable inverse: namely, that f is 
continuously differentiable at c and f’(c) 4 0. Example|8.24]shows that one cannot 
expect the inverse of a differentiable function f to exist locally at c if f’(c) = 0, 
while Example[8.38] shows that the condition f’(c) Æ 0 is not, on its own, sufficient 
to imply the existence of a local inverse. 


Before stating the theorem, we give a precise definition of local invertibility. 
Definition 8.48. A function f : A > R is locally invertible at an interior point 


c € A if there exist open neighborhoods U of c and V of f(c) such that f|y : U + V 
is one-to-one and onto, in which case f has a local inverse ( f|,;)~': V > U. 


The following examples illustrate the definition. 
Example 8.49. If f : R — R is the square function f(x) = x”, then a local inverse 
at c = 2 with U = (1,3) and V = (1,9) is defined by 
Cl = vy. 
Similarly, a local inverse at c = —2 with U = (—3,—1) and V = (1,9) is defined by 
Clu) U) = -v 


In defining a local inverse at c, we require that it maps an open neighborhood V of 
f(c) onto an open neighborhood U of c; that is, we want (f|,;)~'(y) to be “close” 
to c when y is “close” to f(c), not some more distant point that f also maps “close” 
to f(c). Thus, the one-to-one, onto function g defined by 


| _J=79 ifi<y<4 
g : (1,9) + (—2, —1) U [2,3), wi if4<y<9 


is not a local inverse of f at c = 2 in the sense of Definition even though 
g(f(2)) = 2 and both compositions 

f Og: (1,9) = (1,9), go f : (—2, —1) U [2,3) > (2, 1) U [2, 3) 
are identity maps, since U = (—2, —1) U [2,3) is not a neighborhood of 2. 


158 8. Differentiable Functions 


Example 8.50. The function f : R — R defined by 

cos(1/x) ifx 40 

m= (1/2) # 

0 ifr=0 
is locally invertible at every c € R with c #0 or c # 1/(nr) for some n € Z. 
Theorem 8.51 (Inverse function). Suppose that f : A C R > R and c € A is an 
interior point of A. If f is differentiable in a neighborhood of c, f’(c) 4 0, and f’ is 
continuous at c, then there are open neighborhoods U of c and V of f(c) such that 


f has a local inverse (f|,;)~' : V + U. Furthermore, the local inverse function is 
differentiable at f(c) with derivative 


(flu) TF) = Fo 


Proof. Suppose, for definiteness, that f’(c) > 0 (otherwise, consider — f). By the 

continuity of f’, there exists an open interval U = (a,b) containing c on which 

f’ > 0. It follows from Theorem that f is strictly increasing on U. Writing 
V = f(U) = (f(a), f(b)), 

we see that fly : U — V is one-to-one and onto, so f has a local inverse on V, 

which proves the first part of the theorem. 


It remains to prove that the local inverse ( f|,;)~', which we denote by f~t for 
short, is differentiable. First, since f is differentiable at c, we have 


fle+h) = f(c) + f'(c)h + r(h) 


where the remainder r satisfies 


. 7(h) 
R 
Since f’(c) > 0, there exists 6 > 0 such that 


= 0. 


1 
P< EFOR for hl <6. 
It follows from the differentiability of f that, if |h| < ô, 
Folh] = [f(e +h) — fle) — r(h)| 
< |fe +h) — flo] + Ir()| 
1 
S|f(e+h) = FO] + 5f lhl. 


Absorbing the term proportional to |A| on the right hand side of this inequality into 
the left hand side and writing 


f(e+h) = f(c) +k, 
we find that 
Troh < |k| for |h| < 6. 
Choosing 6 > 0 small enough that (c — ô,c +6) C U, we can express h in terms of 
k as 


h = fof) +k) — fF (0). 


8.7. * The inverse function theorem 159 


Using this expression in the expansion of f evaluated at c+h, 
flet+h) = f(c) + f (c)h + r(h), 
we get that 
FE tk = FO PO TEO +k) = EEO] +r). 


Simplifying and rearranging this equation, we obtain the corresponding expansion 
for f7! evaluated at f(c) +k, 


n Gi ee 
i (Fk) = F Gt Meg T (k), 


where the remainder s is given by 


1 1 
=F = 


Since f’(c)|h|/2 < |k], it follows that 
Is) 2 [rfl 


r (FCO +k) = fF) - 


kl = Fle a - 
Therefore, by the “sandwich” theorem and the fact that h —> 0 as k > 0, 
k 
EOIN 
k>0 |k] 


This result proves that f~t is differentiable at f(c) with 
1 
TUON = a5: 
EO = a5 
The expression for the derivative of the inverse also follows from Proposition 
but only once we know that f~t is dfferentiable at f(c). 


One can show that Theorem remains true under the weaker hypothesis that 
the derivative exists and is nonzero in an open neighborhood of c, but in practise, 
we almost always apply the theorem to continuously differentiable functions. 


The inverse function theorem generalizes to functions of several variables, f : 
A C R” > R”, with a suitable generalization of the derivative of f at c as the linear 
map f’(c) : R” > R” that approximates f near c. A different proof of the exis- 
tence of a local inverse is required in that case, since one cannot use monotonicity 
arguments. 


As an example of the application of the inverse function theorem, we consider 
a simple problem from bifurcation theory. 


Example 8.52. Consider the transcendental equation 
y=x—k(e*—-1) 


where k € R is a constant parameter. Suppose that we want to solve for x € R 
given y € R. If y = 0, then an obvious solution is x = 0. The inverse function 
theorem applied to the continuously differentiable function f(x; k) = x — k(e” — 1) 
implies that there are neighborhoods U, V of 0 (depending on k) such that the 
equation has a unique solution x € U for every y € V provided that the derivative 


160 8. Differentiable Functions 


0.5 T r T 


Figure 2. Graph of y = f(x; k) for the function in Example|8.52| (a) k = 0.5 
(green); (b) k = 1 (blue); (c) k = 1.5 (red). When y is sufficiently close to 
zero, there is a unique solution for x in some neighborhood of zero unless k = 1. 


of f with respect to x at 0, given by f,(0;4) = 1— k is non-zero i.e., provided that 
k #1 (see Figure [). 


8.7. * The inverse function theorem 161 


Figure 3. Plot of the solutions for x of the nonlinear equation x = k(e® — 1) 
as a function of the parameter k (see Example[8.52). The point (x, k) = (0,1) 
where the two solution branches cross is called a bifurcation point. 


Alternatively, we can fix a value of y, say y = 0, and ask how the solutions of 
the corresponding equation for x, 


x —k(e* —1)=0, 


162 8. Differentiable Functions 


depend on the parameter k. Figure [2] plots the solutions for x as a function of k for 
0.2 < k < 2. The equation has two different solutions for x unless k = 1. The branch 
of nonzero solutions crosses the branch of zero solution at the point (x, k) = (0,1), 
called a bifurcation point. The implicit function theorem, which is a generalization 
of the inverse function theorem, implies that a necessary condition for a solution 
(xo, ko) of the equation f(x;k) = 0 to be a bifurcation point, meaning that the 
equation fails to have a unique solution branch x = g(k) in some neighborhood of 
(xo, ko), is that fe(X0} ko) = 0. 


8.8. * L’Hdspital’s rule 


In this section, we prove a rule (much beloved by calculus students) for the evalu- 
ation of inderminate limits of the form 0/0 or co/oo. Our proof uses the following 
generalization of the mean value theorem. 


Theorem 8.53 (Cauchy mean value). Suppose that f,g : [a,b] + R are continuous 
on the closed, bounded interval [a,b] and differentiable on the open interval (a,b). 
Then there exists a < c < b such that 


f'(©) lg) — g(a)] = [F (0) — Fla] g' (c). 
Proof. The function h : [a,b] + R defined by 
h(x) = [f(a) — £(a)] [9(b) — g(a)] — [F(6) — Fla)] a(x) — g(a)] 
is continuous on [a,b] and differentiable on (a,b) with 
h'(x) = f'(x) [9(b) — g(@)] — [f(0) — F(a)] g' (2). 


Moreover, h(a) = h(b) = 0. Rolle’s Theorem implies that there exists a < c < b 
such that h’(c) = 0, which proves the result. 


If g(x) = a, then this theorem reduces to the usual mean value theorem (The- 
orem|8.33). Next, we state one form of |’Héspital’s rule. 


Theorem 8.54 (l’Héspital’s rule: 0/0). Suppose that f,g : (a,b) > R are differen- 
tiable functions on a bounded open interval (a,b) such that g'(x) 4 0 for x € (a,b) 
and 


lim f(x) =0, lim g(x) =0. 
a—at asat 
Then f 
-r me in 2g, 
x—at g'(x) zat g(a 


Proof. We may extend f,g : [a,b) —> R to continuous functions on |a, b) by defining 
f(a) = g(a) = 0. If a < x < b, then by the mean value theorem, there exists 
a < c< z such that 


g(x) = g(x) — g(a) = g'(c)(x — a) £0, 
so g Æ 0 on (a,b). Moreover, by the Cauchy mean value theorem (Theorem [8.53), 
there exists a < c < x such that 


fa) _ fæ) - Fa Fo 
glz) glæ)—-gla) g’(c) 


8.8. * L’Hoéspital’s rule 163 


Since c > a™ as > a™, the result follows. (In fact, since a < c < x, the ô that 
“works” for f’/g’ also “works” for f/g.) 


Example 8.55. Using l’Héspital’s rule twice (verify that all of the hypotheses are 
satisfied!), we find that 
1 — cos x sin x cosx 1 


lim —~.— = lim = lim = 
2 z>0+ 2r z=>0+ 2 2 


Analogous results and proofs apply to left limits (x + a`), two-sided limits 
(x — a), and infinite limits (x —> œo or + —o0). Alternatively, one can reduce 
these limits to the left limit considered in Theorem [8.54 


For example, suppose that f,g : (a,o0) —> R are differentiable, g’ Æ 0, and 
f(x) — 0, g(x) — 0 as x > co. Assuming that a > 0 without loss of generality, we 
define F,G : (0,1/a) > R by 


The chain rule implies that 


PG) = = f (+) ’ G' (t) = -5 (+) 


Replacing limits as x oo by equivalent limits as t > 0+ and applying Theo- 
rem to F, G, all of whose hypothesis are satisfied if the limit of f’(x)/g/(x) as 
xz — oo exists, we get 

$2) FEO gg PO Fe) 


i 4 = =] l 
>o glz) >o GE) o+ G'E) a>% g'(x) 


A less straightforward generalization is to the case when g and possibly f have 
infinite limits as x —> at. In that case, we cannot simply extend f and g by 
continuity to the point a. Instead, we introduce two points a < x < y < b and 
consider the limits x > at followed by y > at. 


Theorem 8.56 (l’Hoéspital’s rule: co/oo). Suppose that f,g : (a,b) —> R are 
differentiable functions on a bounded open interval (a,b) such that g'(x) 4 0 for 
€ (a,b) and 
lim |g(a)| = co. 


asat 
Then 
= L implies that lim f(x) = 
wat g(x) 


f(x) 


a—at g'(x) 


Proof. Since |g(x)| > oo as x — at, we have g # 0 near a, and we may assume 
without loss of generality that g 4 0 on (a,b). Ifa < x < y < b, then the mean 
value theorem implies that g(a) — g(y) 4 0, since g’ Æ 0, and the Cauchy mean 
value theorem implies that there exists x < c < y such that 


fa) -= fa) _ flo) 


glz)— gly) glo) 


164 8. Differentiable Functions 


We may therefore write 


f) _ Peper ee | fly 
g(x) (x) — g(y) g(x) g(x) 
_ fe h s)| _ fy) 
g'(c) g(x)} g(a) 
It follows that 
f(z) O) IEC loa) Ea) 
Fe 1) < Jlo Eoen o 
Given e€ > 0, choose 6 > 0 such that 
PEDE fora<c<ato. 
g' (c) 
Then, since a < c < y, we have for alla < x < y < a + ô that 
f(x) | oe La (9M | , |FH 
Je) |< Hte a) a: 


Fixing y, taking the lim sup of this inequality as x > a*, and using the assumption 
that |g(x)| + œœ, we find that 


lim sup f(z) —L)<e 
zat g(x) 
Since € > 0 is arbitrary, we have 
lim sup f(z) — L| = 0, 
rat g(x) 


which proves the result. 
Alternatively, instead of using the lim sup, we can verify the limit explicitly by 
an “e/3”-argument. Given e > 0, choose 7 > 0 such that 
1 
raa) 
g'(c) 
choose a < y < a +n, and let 65 = y — a > 0. Next, choose 62 > 0 such that 


€ 
<3 fra<c<a+n, 


3 € 
lg(x)| > = (IL|+ $) lg(y)| for a < xz < a + ô2, 
€ 3 
and choose 63 > 0 such that 
3 
l(a) >21f)| fora<a<a+éy, 


Let 6 = min(ô1, 42,63) > 0. Then for a < x < a + ĝ, we have 
f(c) z| f(c) a 
g'(c) g'(c) | | g(x) 


which proves the result. 


Fars 


Fal e, 
A5 > 


8.8. * L’Hoéspital’s rule 165 


We often use this result when both f(a) and g(x) diverge to infinity as x > a‘, 
but no assumption on the behavior of f(a) is required. 


As for the previous theorem, analogous results and proofs apply to other limits 
(xz > a, x > a, or x + +00). There are also versions of l’Héspital’s rule that 
imply the divergence of f(x)/g(x) to too, but we consider here only the case of a 
finite limit L. 


Example 8.57. Since e” — œ as x > œ, we get by l|’Hospital’s rule that 

lim — = lim — =0. 

woo eT x= e7 
Similarly, since x + oo as as z —> œ, we get by l’Hospital’s rule that 
1 
lim ata e 

L300 T roo 1 

That is, e” grows faster than x and logx grows slower than x as z — oo. We 
also write these limits using “little oh” notation as z = o(e”) and log = o(x) as 
z> 00. 


log x 


Finally, we note that one cannot use l’Hôspital’s rule “in reverse” to deduce 
that f’/g’ has a limit if f/g has a limit. 


Example 8.58. Let f(x) = «+ sina and g(x) = x. Then f(x),g(x) > cw as 
x — oo and 


but the limit 


does not exist. 


| 
Chapter 9 


Sequences and Series of 
Functions 


In this chapter, we define and study the convergence of sequences and series of 
functions. There are many different ways to define the convergence of a sequence 
of functions, and different definitions lead to inequivalent types of convergence. We 
consider here two basic types: pointwise and uniform convergence. 


9.1. Pointwise convergence 


Pointwise convergence defines the convergence of functions in terms of the conver- 
gence of their values at each point of their domain. 


Definition 9.1. Suppose that (fn) is a sequence of functions fn : A —> R and 
f: A— R. Then f, > f pointwise on A if fa(x) > f(a) as n > oo for every 
TEA. 


We say that the sequence (fn) converges pointwise if it converges pointwise to 
some function f, in which case 


f()= lim f,(c). 


Pointwise convergence is, perhaps, the most obvious way to define the convergence 
of functions, and it is one of the most important. Nevertheless, as the following 
examples illustrate, it is not as well-behaved as one might initially expect. 


Example 9.2. Suppose that fn : (0,1) > R is defined by 


n 
falz) = nz+1 
Then, since x Æ 0, 
. À 1 o1 


167 


168 9. Sequences and Series of Functions 
so fn —> f pointwise where f : (0,1) — R is given by 


We have |f,(x)| < n for all x € (0,1), so each fn is bounded on (0,1), but the 
pointwise limit f is not. Thus, pointwise convergence does not, in general, preserve 
boundedness. 


Example 9.3. Suppose that fn : [0,1] — R is defined by f, (a) = z”. IfO0<a<1, 
then z” > 0 as n > œ, while if x = 1, then z” —> 1 as n > œ. So fr > f 


pointwise where 
0 if0<a<l1, 
Ies a ifr=1. 


Although each fn is continuous on [0, 1], the pointwise limit f is not (it is discontin- 
uous at 1). Thus, pointwise convergence does not, in general, preserve continuity. 


Example 9.4. Define fn : [0,1] > R by 


2n?x if 0 < x < 1/(2n) 
falt) = $ 2n? (1/n— x) if1/(2n)< x< 1/n, 
0 I/n<a<l. 


If0 <x < 1, then f,(a) = 0 for all n > 1/z, so f,(x) > 0 as n —> oo; and if 
x = 0, then fa(x) = 0 for all n, so fn(x) — 0 also. It follows that fan > 0 pointwise 
on [0,1]. This is the case even though max fh = n —> co as n > oo. Thus, a 
pointwise convergent sequence (fn) of functions need not be uniformly bounded 
(that is, bounded independently of n), even if it converges to zero. 


Example 9.5. Define fn : R > R by 


Then fn — 0 pointwise on R. The sequence (f/,) of derivatives f/ (x) = cos ng does 
not converge pointwise on R; for example, 


fa (T) = (-1)" 


does not converge as n — oo. Thus, in general, one cannot differentiate a pointwise 
convergent sequence. This behavior isn’t limited to pointwise convergent sequences, 
and happens because the derivative of a small, rapidly oscillating function can be 
large. 


Example 9.6. Define fn : R —> R by 


If x £0, then 


9.2. Uniform convergence 169 


while f,,(0) = 0 for all n € N, so fn — |x| pointwise on R. Moreover, 


1 ifz>0 
fj 0 re 0 
x) = — if z= 
n 2 F Lim)? 
(r L i ifz<0 


The pointwise limit || isn’t differentiable at 0 even though all of the fn are differ- 
entiable on R and the derivatives f/, converge pointwise on R. (The fn’s “round 
off” the corner in the absolute value function.) 


Example 9.7. Define fn : R > R by 
x n 
f(x) = (1+). 


Then, by the limit formula for the exponential, fn — e” pointwise on R. 


9.2. Uniform convergence 


In this section, we introduce a stronger notion of convergence of functions than 
pointwise convergence, called uniform convergence. The difference between point- 
wise convergence and uniform convergence is analogous to the difference between 
continuity and uniform continuity. 


Definition 9.8. Suppose that (fn) is a sequence of functions fn : A —> R and 
f:A—-R. Then fn > f uniformly on A if, for every e > 0, there exists N € N 
such that 

n > N implies that |f,(«) — f(x)| < e for all x € A. 


When the domain A of the functions is understood, we will often say fn > f 
uniformly instead of uniformly on A. 


The crucial point in this definition is that N depends only on € and not on 
x € A, whereas for a pointwise convergent sequence N may depend on both e and 
x. A uniformly convergent sequence is always pointwise convergent (to the same 
limit), but the converse is not true. If a sequence converges pointwise, it may 
happen that for some € > 0 one needs to choose arbitrarily large N’s for different 
points x € A, meaning that the sequences of values converge arbitrarily slowly 
on A. In that case a pointwise convergent sequence of functions is not uniformly 
convergent. 


Example 9.9. The sequence f,(z) = x” in Example [9.3] converges pointwise on 
[0, 1] but not uniformly on [0,1]. For 0 < x < 1, we have 
|fn(a) — f(a)| = z”. 

If 0 < e< 1, we cannot make x” < e for all 0 < x < 1 however large we choose 
n. The problem is that x” converges to 0 at an arbitrarily slow rate for x suf- 
ficiently close to 1. There is no difficulty in the rate of convergence at 1 itself, 
since f,(1) = 1 for every n € N. As we will show, the uniform limit of continuous 
functions is continuous, so since the pointwise limit of the continuous functions fn 
is discontinuous, the sequence cannot converge uniformly on [0,1]. The sequence 
does, however, converge uniformly to 0 on [0,6] for every 0 < b < 1; given e > 0, 
we take N large enough that bN < e. 


170 9. Sequences and Series of Functions 


Example 9.10. The pointwise convergent sequence in Example [9.4] does not con- 
verge uniformly. If it did, it would have to converge to the pointwise limit 0, but 


m (an) 


so for no € > 0 does there exist an N € N such that |f,(a) — 0| < e for all z € A 
and n > N, since this inequality fails for n > e if x = 1/(2n). 


=n, 


Example 9.11. The functions in Example|9.5] converge uniformly to 0 on R, since 
|sinna| _ 1 
hol- s, 


n 
so |fn(a) —0| < e for all xz E R if n > 1/e. 


9.3. Cauchy condition for uniform convergence 


The Cauchy condition in Definition [3.45] provides a necessary and sufficient con- 
dition for a sequence of real numbers to converge. There is an analogous uniform 
Cauchy condition that provides a necessary and sufficient condition for a sequence 
of functions to converge uniformly. 


Definition 9.12. A sequence (fn) of functions fa : A — R is uniformly Cauchy 
on A if for every e > 0 there exists N € N such that 


m,n > N implies that |fm(x) — fn(x)| < € for all x € A. 


The key part of the following proof is the argument to show that a pointwise 
convergent, uniformly Cauchy sequence converges uniformly. 


Theorem 9.13. A sequence (fn) of functions fn : A > R converges uniformly on 
A if and only if it is uniformly Cauchy on A. 


Proof. Suppose that (fn) converges uniformly to f on A. Then, given e > 0, there 
exists N € N such that 


lale) — f(z) < 5 for alre Aifn >N. 


It follows that if m,n > N then 


|fm(x) — fn(@)| S| fm(@) — F(@)| + |f) fale) <e forall x € A, 


which shows that (fn) is uniformly Cauchy. 


Conversely, suppose that (fn) is uniformly Cauchy. Then for each x € A, the 
real sequence (fn(x)) is Cauchy, so it converges by the completeness of R. We 
define f : A > R by 


F(x) = lim fala), 


n—->co 


and then fn + f pointwise. 


To prove that fn —> f uniformly, let € > 0. Since (fn) is uniformly Cauchy, we 
can choose N € N (depending only on €) such that 


| fn(2) — fn(2)| < S forall ze Aifm,n>N. 


9.4. Properties of uniform convergence 171 


Let n > N and x € A. Then for every m > N we have 
|fn(a) — f(a) < |fn(@) — fm (a)| + [fm(@) -— Fæ) < 5+ | fm (a) — f(x)]. 


Since fm(x) + f(a) as m — co, we can choose m > N (depending on x, but it 
doesn’t matter since m doesn’t appear in the final result) such that 


€ 
lfm) — Fe) < $. 
It follows that if n > N, then 
lfn(x) — f(x)| < € for all x € A, 


which proves that fn —> f uniformly. 


Alternatively, we can take the limit as m — oo in the uniform Cauchy condition 
to get for alla € A and n > N that 


[f(x) — fo(z)| = lim [fm(x) - fala) < $ < €- 


9.4. Properties of uniform convergence 


In this section we prove that, unlike pointwise convergence, uniform convergence 
preserves boundedness and continuity. Uniform convergence does not preserve dif- 
ferentiability any better than pointwise convergence. Nevertheless, we give a result 
that allows us to differentiate a convergent sequence; the key assumption is that 
the derivatives converge uniformly. 


9.4.1. Boundedness. First, we consider the uniform convergence of bounded 
functions. 


Theorem 9.14. Suppose that fn : A — R is bounded on A for every n € N and 
fn —> f uniformly on A. Then f : A > R is bounded on A. 
Proof. Taking « = 1 in the definition of the uniform convergence, we find that 
there exists N € N such that 

lfn(x) — f(x)| <1 for alze Aifn>N. 


Choose some n > N. Then, since fn» is bounded, there is a constant M > 0 such 
that 
lfn(x)| < M for alla € A. 


It follows that 
lf(z)| IFE) falla M ť forallze A, 


meaning that f is bounded on A. 


In particular, it follows that if a sequence of bounded functions converges point- 
wise to an unbounded function, then the convergence is not uniform. 


172 9. Sequences and Series of Functions 


Example 9.15. The sequence of functions fn : (0,1) > R in Example[9.2| defined 
by 
n 
falz) = naz +1’ 

cannot converge uniformly on (0,1), since each fn is bounded on (0,1), but the 
pointwise limit f(x) = 1/x is not. The sequence (fn) does, however, converge 
uniformly to f on every interval [a,1) with 0 < a < 1. To prove this, we estimate 
fora <a < 1 that 


|fn(x) — f(a)| = 


n 1} 1 1 


~ gæ(ng+1) ` na? ~ na 


2° 


nr+1 zx 
Thus, given € > 0 choose N = 1/(a7e), and then 

lfn(x) — f(a)| <€ for all x € [a,1) ifn > N, 
which proves that f, > f uniformly on [a,1). Note that 


LORE 


= for all x € [a, 1) 
a 
so the uniform limit f is bounded on [a,1), as Theorem requires. 


9.4.2. Continuity. One of the most important properties of uniform conver- 
gence is that it preserves continuity. We use an “e/3” argument to get the continuity 
of the uniform limit f from the continuity of the fn. 


Theorem 9.16. If a sequence (fn) of continuous functions fn : A > R converges 
uniformly on A C R to f : A > R, then f is continuous on A. 


Proof. Suppose that c € A and let € > 0. Then, for every n € N, 


|æ) = F) < |F) = fr()| + |fn(@) — fall + lfl) — FCO). 


By the uniform convergence of (fn), we can choose n € N such that 


f(z) — f(2)| < 5 for all x € A, 
and for such an n it follows that 
2 
If(@) — FO < fala) — Fall + =. 


(Here, we use the fact that fn is close to f at both x and c, where z is an an arbitrary 
point in a neighborhood of c; this is where we use the uniform convergence in a 
crucial way.) 


Since fn is continuous on A, there exists ô > 0 such that 


lfn(x) — fr(o)| a if |z — c| < ô and z € A, 


which implies that 
fæ) — f(| < e€ if |z — c| < ô and x € A. 


This proves that f is continuous. 


9.4. Properties of uniform convergence 173 


This result can be interpreted as justifying an “exchange in the order of limits” 


lim lim f,(v) = lim lim f(z). 

noo LC «we NCO 
Such exchanges of limits always require some sort of condition for their validity — in 
this case, the uniform convergence of fn to f is sufficient, but pointwise convergence 
is not. 

It follows from Theorem that if a sequence of continuous functions con- 
verges pointwise to a discontinuous function, as in Example then the conver- 
gence is not uniform. The converse is not true, however, and the pointwise limit 
of a sequence of continuous functions may be continuous even if the convergence is 
not uniform, as in Example [9.4] 


9.4.3. Differentiability. The uniform convergence of differentiable functions 
does not, in general, imply anything about the convergence of their derivatives or 
the differentiability of their limit. As noted above, this is because the values of 
two functions may be close together while the values of their derivatives are far 
apart (if, for example, one function varies slowly while the other oscillates rapidly, 
as in Example [9.5p. Thus, we have to impose strong conditions on a sequence of 
functions and their derivatives if we hope to prove that fn > f implies f} > f’. 


The following example shows that the limit of the derivatives need not equal 
the derivative of the limit even if a sequence of differentiable functions converges 
uniformly and their derivatives converge pointwise. 


Example 9.17. Consider the sequence (fn) of functions fn : R > R defined by 
x 
fala) TF na?" 
Then fn — 0 uniformly on R. To see this, we write 


o 1 yne \ 1 t 
where t = \/n|z|. We have 


t 1 
<= for all R 
I2 S? or allte Ñ, 
since (1 — t)? > 0, which implies that 2t < 1 + ¢?. Using this inequality, we get 
1 
lfn(x)| < T for all z € R. 


Hence, given € > 0, choose N = 1/(4e?). Then 
|fn(x)| < € forallx e€Rifn>N, 


which proves that (fn) converges uniformly to 0 on R. (Alternatively, we could get 
the same result by using calculus to compute the maximum value of |f,,| on R.) 
Each fn is differentiable with 


1— na? 


f(z) = (+ na)" 


174 9. Sequences and Series of Functions 


It follows that f! > g pointwise as n + oo where 


a) {9 fr#0, 
oa = 
I i dae, 


The convergence is not uniform since g is discontinuous at 0. Thus, fn — 0 uni- 
formly, but f} (0) > 1, so the limit of the derivatives is not the derivative of the 
limit. 


However, we do get a useful result if we strengthen the assumptions and require 
that the derivatives converge uniformly, not just pointwise. The proof involves a 
slightly tricky application of the mean value theorem. 


Theorem 9.18. Suppose that (fn) is a sequence of differentiable functions fn : 
(a,b) + R such that fn —> f pointwise and ff > g uniformly for some f,g : 
(a,b) > R. Then f is differentiable on (a,b) and f’ = g. 


Proof. Let c € (a,b), and let € > 0 be given. To prove that f’(c) = g(c), we 
estimate the difference quotient of f in terms of the difference quotients of the fn: 


=H _ ga] a RO _ MOA 
[ROO _ pi] +10 - 910) 


where x € (a,b) and x Æ c. We want to make each of the terms on the right-hand 
side of the inequality less than €/3. This is straightforward for the second term 
(since fn is differentiable) and the third term (since ff}, —> g). To estimate the first 
term, we approximate f by fm, use the mean value theorem, and let m — oo. 

Since fim — fn is differentiable, the mean value theorem implies that there exists 
€ between c and x such that 


fm) = fme) _ fala) = fale) _ (fm = fn) (E) = (Sfm = fn) (©) 
L-C L-C z-e 
= fn(€) — fn(€)- 
Since (f/) converges uniformly, it is uniformly Cauchy by T heorem|9.13] Therefore 
there exists N; € N such that 


(Fal) lE for all € € (a,b) if m,n > Mi, 


which implies that 
Im(x) —fmc) fala) — fn(c) 


wv—C w= 


€ 
3° 

Taking the limit of this equation as m — oo, and using the pointwise convergence 
of (fm) to f, we get that 


f(z) — fle) _ fale) = falo) 


w&—C wv—C 


< 


a- for n > Nj. 


Next, since (f/,) converges to g, there exists Nz € N such that 
lini) =9(e)| < 5 for all n > No. 


9.5. Series 175 


Choose some n > max(Nı, N2). Then the differentiability of fn implies that there 
exists ô > 0 such that 


falz) — falc) 


- fo] <5 if0<|a—e| < ô. 


L—C 3 
Putting these inequalities together, we get that 
WOE oe Peed ee. 
L-C 


which proves that f is differentiable at c with f’(c) = g(c). 


Like Theorem Theorem can be interpreted as giving sufficient condi- 
tions for an exchange in the order of limits: 


Üm. lim [eee] z 


T—C L—C 


[2-20], 


It is worth noting that in Theorem the derivatives f/, are not assumed to 
be continuous. If they are continuous, then one can use Riemann integration and 
the fundamental theorem of calculus to give a simpler proof (see Theorem|12.21). 


9.5. Series 


The convergence of a series is defined in terms of the convergence of its sequence of 
partial sums, and any result about sequences is easily translated into a correspond- 
ing result about series. 


Definition 9.19. Suppose that (fn) is a sequence of functions fn : A > R. Let 
(Sn) be the sequence of partial sums Sn : A —> R, defined by 


Then the series 
S(z) = >> fale) 
n=1 


converges pointwise to S: A > R on A if Sn > S as n — oo pointwise on A, and 
uniformly to S on A if Sn —> S uniformly on A. 


We illustrate the definition with a series whose partial sums we can compute 
explicitly. 
Example 9.20. The geometric series 
co 
Xr” =1+r+r +a? +... 
n=0 


has partial sums 


176 9. Sequences and Series of Functions 


Thus, $,,(z) > 1/(1 — x) as n > œ if |x| < 1 and diverges if |x| > 1, meaning that 


= 1 
5 m = pointwise on (—1, 1). 
n=0 Eo 


Since 1/(1— x) is unbounded on (—1, 1), Theorem implies that the convergence 
cannot be uniform. 


The series does, however, converge uniformly on |—p, p] for every 0 < p < 1. 
To prove this, we estimate for |x| < p that 


1 
l-z 


pr n+1 


_ lz 


<P 


Sn(z) l-a ` 1-p 


Since p"*1/(1 — p) — 0 as n — œ, given any e€ > 0 there exists N € N, depending 
only on € and p, such that 


<€ for all n > N. 


It follows that 

Ca 1 
Se-ri 
k=0 om 


which proves that the series converges uniformly on [—, p]. 


<e for all x € [—p, p] and all n > N, 


The Cauchy condition for the uniform convergence of sequences immediately 
gives a corresponding Cauchy condition for the uniform convergence of series. 


Theorem 9.21. Let (fn) be a sequence of functions fn : A > R. The series 
dfs 
n=l 


converges uniformly on A if and only if for every e > 0 there exists N € N such 


that 
X fla) 


k=m+1 


<e for alla € Aandalln >m>N. 


Proof. Let 
Sale) = $ fez) = fale) + fo(@) +--+ fale). 
k=1 


From Theorem the sequence (Sn), and therefore the series X` fn, converges 
uniformly if and only if for every e > 0 there exists N such that 


|Si,(@) — Sim(a)| < € for alla € A and all n,m > N. 


Assuming n > m without loss of generality, we have 


Sp(t) — Sim() = fmi l(t) + fmla) + + fala) = XO Ala), 


so the result follows. 


9.5. Series 177 


The following simple criterion for the uniform convergence of a series is very 
useful. The name comes from the letter traditionally used to denote the constants, 
or “majorants,” that bound the functions in the series. 


Theorem 9.22 (Weierstrass M-test). Let (fn) be a sequence of functions fn : A > 
R, and suppose that for every n € N there exists a constant Mn» > 0 such that 


Ifaa) <M, forallee A, XO Mn <o. 


n=1 


Then 
n=1 
converges uniformly on A. 


Proof. The result follows immediately from the observation that X` fn is uniformly 
Cauchy if X` Mn is Cauchy. 

In detail, let € > 0 be given. The Cauchy condition for the convergence of a 
real series implies that there exists N € N such that 


n 


5 My <€ for aln >m >N. 
k=m+1 


Then for all x € A and all n > m > N, we have 


XO hla) 


k=m+1 


n 


< 3, lfa) 


k=m+ 


< È m 
k=m+1 
<E 
Thus, >> fn satisfies the uniform Cauchy condition in Theorem|9.21| so it converges 
uniformly. 


Example 9.23. Returning to Example we consider the geometric series 


oo 
) Raie 
n=0 


If |z| < p where 0 < p < 1, then 


[0.6] 
lls, Se <i 
n=0 


The M-test, with Mn = p”, implies that the series converges uniformly on [—p, p]. 


Example 9.24. The series 


178 9. Sequences and Series of Functions 


Figure 1. Graph of the Weierstrass continuous, nowhere differentiable func- 
tion y = J po 27” cos(3”x) on one period [0, 27]. 


converges uniformly on R by the M-test since 
1 <1 
< — — =]. 
= gn?’ 2 gn 
It then follows from Theorem that f is continuous on R. (See Figure Ep 


Taking the formal term-by-term derivative of the series for f, we get a series 
whose coefficients grow with n, 


so we might expect that there are difficulties in differentiating f. As Figure 
illustrates, the function doesn’t look smooth at any length-scale. Weierstrass (1872) 
proved that f is not differentiable at any point of R. Bolzano (1830) had also 
constructed a continuous, nowhere differentiable function, but his results weren’t 
published until 1922. Subsequently, Tagaki (1903) constructed a similar function 
to the Weierstrass function whose nowhere-differentiability is easier to prove. Such 
functions were considered to be highly counter-intuitive and pathological at the time 
Weierstrass discovered them, and they weren’t well-received by many prominent 
mathematicians. 


9.5. Series 179 


0.05 -0.18 


ol fy pf for i" 
nal PWT | whl 
i 


4.02 4.04 4.06 4.08 41 4 4.002 4.004 4.006 4.008 4.01 
x x 


Figure 2. Details of the Weierstrass function for 4 < x < 4.1 (left) and 
4 < x < 4.01 (right) showing its self-similar, fractal behavior under rescalings. 


If the Weierstrass M-test applies to a series of functions to prove uniform 
convergence, then it also implies that the series converges absolutely, meaning that 


SS lfn(x)| < co for every x € A. 
n=1 


Thus, the M-test is not applicable to series that converge uniformly but not abso- 
lutely. 


Absolute convergence of a series is completely different from uniform conver- 
gence, and the two concepts shouldn’t be confused. Absolute convergence on A is a 
pointwise condition for each x € A, while uniform convergence is a global condition 
that involves all points x € A simultaneously. We illustrate the difference with a 
rather trivial example. 


Example 9.25. Let fn : R — R be the constant function 


n 
Then >> fn converges on R to the constant function f(x) = c, where 


aque 
D RT 


is the sum of the alternating harmonic series (c = log 2). The convergence of )> fn is 
uniform on R since the terms in the series do not depend on x, but the convergence 
isn’t absolute at any x € R since the harmonic series 
co 
3 
n 
n=1 


diverges to infinity. 


Chapter 10 


Power Series 


In discussing power series it is good to recall a nursery rhyme: 
“There was a little girl 
Who had a little curl 
Right in the middle of her forehead 
When she was good 
She was very, very good 
But when she was bad 
She was horrid.” 
(Robert Strichartz [14]) 


Power series are one of the most useful type of series in analysis. For example, 
we can use them to define transcendental functions such as the exponential and 
trigonometric functions (as well as many other less familiar functions). 


10.1. Introduction 


A power series (centered at 0) is a series of the form 


Co 
J Ant” = ao + a£ + age? +- + anr” +.... 


n=0 


where the constants a,, are some coefficients. If all but finitely many of the a, are 
zero, then the power series is a polynomial function, but if infinitely many of the 
an are nonzero, then we need to consider the convergence of the power series. 

The basic facts are these: Every power series has a radius of convergence 0 < 
R < œ, which depends on the coefficients an. The power series converges absolutely 
in |z| < R and diverges in |x| > R. Moreover, the convergence is uniform on every 
interval |x| < p where 0 < p < R. If R > 0, then the sum of the power series is 
infinitely differentiable in |x| < R, and its derivatives are given by differentiating 
the original power series term-by-term. 


181 


182 10. Power Series 


Power series work just as well for complex numbers as real numbers, and are 
in fact best viewed from that perspective. We will consider here only real-valued 
power series, although many of the results extend immediately to complex-valued 
power series. 


Definition 10.1. Let (a,,)°2.9 be a sequence of real numbers and c € R. The power 
series centered at c with coefficients a,, is the series 


Co 


5 anlz — c)". 


n=0 


Example 10.2. The following are power series centered at 0: 


co 
Soa®=ltata? +a? +att..., 
n=0 


1 n 1 2 1 3 1 4 | 
md =a ears tg Tos Fwi 


(n!)z” = 1 +g + 207 + 6r? +24rt +... 


oo 
n=0 
oo 
n=0 
oo 


Depe?” agg pa =g 
n=0 


An example of a power series centered at 1 is 


DD e-a 1) sl 1? + eat 


n=1 


The power series in Definition [10.1] is a formal, algebraic expression, since we 
haven’t said anything yet about its convergence. By changing variables (x—c) > a, 
we can assume without loss of generality that a power series is centered at 0, and 
we will do so whenever it’s convenient. 


10.2. Radius of convergence 


First, we prove that every power series has a radius of convergence. 


Theorem 10.3. Let 
5 anlx — c)” 
n=0 


be a power series. There is a non-negative, extended real number 0 < R < co 
such that the series converges absolutely for 0 < |x — c| < R and diverges for 
|z — c| > R. Furthermore, if 0 < p < R, then the power series converges uniformly 
on the interval |z — c| < p, and the sum of the series is continuous in |æ — c| < R. 


Proof. We assume without loss of generality that c = 0. Suppose the power series 


Co 
> an To 


n=0 


10.2. Radius of convergence 183 


converges for some zo € R with zo Æ 0. Then its terms converge to zero, so they 
are bounded and there exists M > 0 such that 


lanzg| < M for n =0,1,2,.... 


If |x| < |x|, then 


n 
< Mr", r= <A 


v 


ñ] = 


janz Janzo| 


x 
To 
Comparing the power series with the convergent geometric series X` Mr”, we see 
that X` a,x” is absolutely convergent. Thus, if the power series converges for some 
xo E R, then it converges absolutely for every x € R with |z| < |xol. 


Let 
R = sup {lel >0: 5 anr” converges} ; 


If R = 0, then the series converges only for x = 0. If R > 0, then the series 
converges absolutely for every x € R with |z| < R, since it converges for some 
zo € R with |z| < |xo| < R. Moreover, the definition of R implies that the series 
diverges for every x € R with |z| > R. If R = œ, then the series converges for all 
zER. 


Finally, let 0 < p < R and suppose |z| < p. Choose ø > 0 such that p < o < R. 


Then X` |ano”| converges, so |ana”| < M, and therefore 

n 

^] = P | < Mr”, 
o 


janz lana” | 


nm 
z] < ano” | 
Oo 


where r = p/o < 1. Since }> Mr” < œœ, the M-test (Theorem [9.22) implies that 
the series converges uniformly on |x| < p, and then it follows from T heorem [9.16] 
that the sum is continuous on |z| < p. Since this holds for every 0 < p < R, the 
sum is continuous in |z| < R. 


The following definition therefore makes sense for every power series. 


Definition 10.4. If the power series 


5 an(z — c)” 
n=0 


converges for |x — c| < R and diverges for |x — c| > R, then 0 < R < œ is called 
the radius of convergence of the power series. 


Theorem [10.3] does not say what happens at the endpoints x = c + R, and in 
general the power series may converge or diverge there. We refer to the set of all 
points where the power series converges as its interval of convergence, which is one 
of 

(c— R,c+ R), (ec— R,c+ R], [c—R,c+R), [c—R,c+ R]. 
We won’t discuss here any general theorems about the convergence of power series 
at the endpoints (e.g., the Abel theorem). Also note that a power series need not 
converge uniformly on |x — c| < R. 


Theorem does not give an explicit expression for the radius of convergence 
of a power series in terms of its coefficients. The ratio test gives a simple, but useful, 


184 10. Power Series 


way to compute the radius of convergence, although it doesn’t apply to every power 
series. 


Theorem 10.5. Suppose that a, 4 0 for all sufficiently large n and the limit 


R= te |" 


n— o0 


An+1 


exists or diverges to infinity. Then the power series 


5 anlx — c)” 
n=0 


has radius of convergence R. 


Proof. Let 
üni (e= et 


an(x — c)” 


An+1 
an 


r= lim 
n00 


= |æ — c| lim 
n— oo 


By the ratio test, the power series converges if 0 < r < 1, or |x — c| < R, and 
diverges if 1 < r < co, or |x — c| > R, which proves the result. 


The root test gives an expression for the radius of convergence of a general 
power series. 


Theorem 10.6 (Hadamard). The radius of convergence R of the power series 


5 an(z — c)” 
n=0 


is given by 
1 


lim sUPpo |@n| 


R= 


1/n 


where R = 0 if the lim sup diverges to oo, and R = œ if the lim sup is 0. 


Proof. Let 
r = limsup |an (a — or” 


= |x — c| lim sup lan" . 
n— Co n— 0 


By the root test, the series converges if 0 < r < 1, or |x — c| < R, and diverges if 
1 <r < oœ, or |z — c| > R, which proves the result. 


This theorem provides an alternate proof of Theorem from the root test; 
in fact, our proof of Theorem is more-or-less a proof of the root test. 


10.3. Examples of power series 


We consider a number of examples of power series and their radii of convergence. 


Example 10.7. The geometric series 


[0.6] 
Soa" =ltat+ar+... 


n=0 


10.3. Examples of power series 185 


has radius of convergence 
1 
R= lim -=1. 


n— oo 
It converges to 
1 Co 
ea for |z| < 1, 
n=0 
and diverges for |z| > 1. At x = 1, the series becomes 
14+1+4+1+4+1+... 


and at x = —1 it becomes 
1-14+1-141-..., 


so the series diverges at both endpoints x = +1. Thus, the interval of convergence 
of the power series is (—1,1). The series converges uniformly on [—p, p] for every 
0 < p < 1 but does not converge uniformly on (—1,1) (see Example (9.20). Note 
that although the function 1/(1 — x) is well-defined for all z 4 1, the power series 
only converges when |z| < 1. 


Example 10.8. The series 


1 n 1 2 1 3 
J =f =O ae A ae pea Pa 
= 2 3 
has radius of convergence 
1 1 
B= ie ed Sh 
n—0o 1/(n + 1) n—0o n 


oes er 
no a ce 


n=1 
which diverges, and at x = —1 it is minus the alternating harmonic series 
co 
(—1)” 1 1.1 
=-1 Sia 
2, n = 2 3 i 4 i 
n=1 


which converges, but not absolutely. Thus the interval of convergence of the power 
series is [—1,1). The series converges uniformly on [—p, p| for every 0 < p < 1 but 
does not converge uniformly on (—1, 1). 


Example 10.9. The power series 


Co 


1 1 
J =q” = 1 +g t3 aor aa Fe 
= n! 3! 
has radius of convergence 
1/n! 1)! 
R= lim t= im erg = lim (n+1)=a0 


so it converges for all x € R. The sum is the exponential function 


186 10. Power Series 


In fact, this power series may be used to define the exponential function. (See 
Section ) 


Example 10.10. The power series 


> (—1)” 2n 1 2 1 4 
5 Gn =l- 5% + ay +... 


n=0 
has radius of convergence R = oo, and it converges for all x € R. Its sum cosg 
provides an analytic definition of the cosine function. 


Example 10.11. The power series 


CD” 2ni gp ts 1s 
pa Gnt =T Th Ter Ps 


has radius of convergence R = ov, and it converges for all x € R. Its sum sing 
provides an analytic definition of the sine function. 


Example 10.12. The power series 


Sper =1 +g + (2la + (3)? + (4r +... 


n=0 
has radius of convergence 


! 1 
R= lim —® i 


——— = lim —— =0, 
n— 00 (n +1)! nso n + 1 


so it converges only for x = 0. If x Æ 0, its terms grow larger once n > 1/|x| and 
|(n!)z”| 4 œ as n > oo. 


Example 10.13. The series 


n+1 1 1 


o (@@-1)"=(@-1)-5(@ 1? +3@ 1° 


has radius of convergence 
= (-1)"t!/n EE no o, 1o 
keu Cmn) M rl ae T 
so it converges if |x — 1| < 1 and diverges if |x — 1| > 1. At the endpoint x = 2, the 
power series becomes the alternating harmonic series 


1 PN: 3 
2 3 4 `”? 


which converges. At the endpoint x = 0, the power series becomes the harmonic 
series 


be e ot 
23 3 4 


which diverges. Thus, the interval of convergence is (0, 2]. 


10.3. Examples of power series 187 


0.6 r 1 


0.57 


> 0.37 4 


0.2; J 


0.17 J 


0 0.2 0.4 0.6 0.8 1 


Figure 1. Graph of the lacunary power series y = ELp)?” on [0, 1). 
It appears relatively well-behaved; however, the small oscillations visible near 
x = 1 are not a numerical artifact. 


Example 10.14. The power series 


y(-1)" 2? =e¢-¢+a°—a2 +e —o% 4+... 


with 

(-1)* ifn=2*, 

0 ifn £2*, 

has radius of convergence R = 1. To prove this, note that the series converges for 
|x| < 1 by comparison with the convergent geometric series ` |x|", since 


If |x| > 1, then the terms do not approach 0 as n — ov, so the series diverges. 
Alternatively, we have 


am f! ifn = 2, 


lanl = V0 ifn ak, 
so 
lim sup |an|*/” = 1 
n— oo 
and the Hadamard formula (Theorem gives R = 1. The series does not 
converge at either endpoint x = 1, so its interval of convergence is (—1, 1). 


In this series, there are successively longer gaps (or “lacuna”) between the 
powers with non-zero coefficients. Such series are called lacunary power series, and 


188 10. Power Series 


> 0.497 a > 05 


0.9 0.92 0.94 0.96 0.98 1 0.99 0.992 0.994 0.996 0.998 1 
x x 


Figure 2. Details of the lacunary power series EL 9 (-1)" 2?” near x = 1, 
showing its oscillatory behavior and the nonexistence of a limit as x 4 17. 


they have many interesting properties. For example, although the series does not 
converge at x = 1, one can ask if 


co 
: EE 
a ye We | 
exists. In a plot of this sum on [0,1), shown in Figure |1| the function appears 
relatively well-behaved near x = 1. However, Hardy (1907) proved that the function 
has infinitely many, very small oscillations as x — 17, as illustrated in Figure 
and the limit does not exist. Subsequent results by Hardy and Littlewood (1926) 
showed, under suitable assumptions on the growth of the “gaps” between non-zero 
coefficients, that if the limit of a lacunary power series as x + 17 exists, then the 
series must converge at x = 1. Since the lacunary power series considered here 
does not converge at 1, its limit as x + 17 cannot exist. For further discussion of 
lacunary power series, see [4]. 


10.4. Algebraic operations on power series 


We can add, multiply, and divide power series in a standard way. For simplicity, 
we consider power series centered at 0. 


Proposition 10.15. If R,S > 0 and the functions 
f(x) = 5 anz” in jz|< R, g(x“) = 5 bnr” in |e| < S 
n=0 n=0 


are sums of convergent power series, then 


(f +9)(@) = X. (an + bp) 2” in |z] <T, 
n=0 
(foz) = X cna” in |x| <T, 


10.4. Algebraic operations on power series 189 


where T = min(R,S) and 


n 
Cn = y an—kbk. 
k=0 


Proof. The power series expansion of f + g follows immediately from the linear- 
ity of limits. The power series expansion of fg follows from the Cauchy product 
(Theorem|4.38), since power series converge absolutely inside their intervals of con- 
vergence, and 


(È ma”) B har) = 3 (>: a ee nat) = a Cnt 
n=0 n=0 


n=0 n=0 \k=0 


It may happen that the radius of convergence of the power series for f +g or fg 
is larger than the radius of convergence of the power series for f, g. For example, 
if g = —f, then the radius of convergence of the power series for f + g = 0 is co 
whatever the radius of convergence of the power series for f. 

The reciprocal of a convergent power series that is nonzero at its center also 
has a power series expansion. 


Proposition 10.16. If R > 0 and 
fæ)=X anz” 3a |e] <2, 
n=0 


is the sum of a power series with ag 4 0, then there exists S > 0 such that 


1 foe) 
—_—_—_ = bp x” in z| < S. 
Fe) = a 


n=0 
The coefficients b, are determined recursively by 


1 1 n-1 
bo = —; bn =—— Y an—nbe, for n > 1. 
eo 40 z0 


Proof. First, we look for a formal power series expansion (i.e., without regard to 
its convergence) 


g(x) = 5 bna” 
n=0 


such that the formal Cauchy product fg is equal to 1. This condition is satisfied if 


(>. ma") (>. bo) = 3 ( : crn) g=] 
n=0 n=0 n=0 \k=0 


Matching the coefficients of x”, we find that 
n—-1 
aobo = 1, aobn + 5 an—kbk = 0 for n > 1, 
k=0 
which gives the stated recursion relation. 


190 10. Power Series 


To complete the proof, we need to show that the formal power series for g has 
a nonzero radius of convergence. In that case, Proposition [10.15]shows that fg=1 
inside the common interval of convergence of f and g, so 1/f = g has a power series 
expansion. We assume without loss of generality that ag = 1; otherwise replace f 
by f/ao. 

The power series for f converges absolutely and uniformly on compact sets 
inside its interval of convergence, so the function 


co 
Yo lanl lal” 
n=1 
is continuous in |x| < R and vanishes at x = 0. It follows that there exists 6 > 0 


such that 
Co 
5 jan| |z|” <1 for |a| < 6. 
n=1 


Then f(x) 4 0 for |x| < 6, since 
If(@)| =1— SF lall" > 0, 
n=1 


so 1/ f(x) is well defined. 
We claim that i 
lbnl < a forn =0,1,2,.... 
The proof is by induction. Since bọ = 1, this inequality is true for n = 0. Ifn > 1 


and the inequality holds for bẹ with 0 < k < n — 1, then by taking the absolute 
value of the recursion relation for bn, we get 


< z |ax| 1 k 1 
bal < -|lbn—kl < a on =. 
lbn| < > lax k| < > gak = pn 2 lax S jn 


so the inequality holds for b, with 0 < k < n, and the claim follows. 
We then get that 


1 
lim sup [bn < 5, 


n— co 
so the Hadamard formula in Theorem implies that the radius of convergence 
of X` b,x” is greater than or equal to 6 > 0, which completes the proof. 


An immediate consequence of these results for products and reciprocals of power 
series is that quotients of convergent power series are given by convergent power 
series, provided that the denominator is nonzero. 


Proposition 10.17. If R,S > 0 and 
f(x) = 5 anz” in jz|< R, glz) = 5 bnz” in ja] <S 
n=0 n=0 


are the sums of power series with bọ Æ 0, then there exists T > 0 and coefficients 
Cn such that 


Mlay eh in |z| < T. 


10.4. Algebraic operations on power series 191 


The previous results do not give an explicit expression for the coefficients in 
the power series expansion of f/g or a sharp estimate for its radius of convergence. 
Using complex analysis, one can show that radius of convergence of the power 
series for f/g centered at 0 is equal to the distance from the origin of the nearest 
singularity of f/g in the complex plane. We will not discuss complex analysis here, 
but we consider two examples. 


Example 10.18. Replacing x by —x? in the geometric power series from Exam- 
ple we get the following power series centered at 0 


1 oo 
Izr? = l—e?+ert— x74... = So (-1)" tt, 
n=0 


which has radius of convergence R = 1. From the point of view of real functions, it 
may appear strange that the radius of convergence is 1, since the function 1/(1 +x?) 
is well-defined on R, has continuous derivatives of all orders, and has power series 
expansions with nonzero radius of convergence centered at every c € R. However, 
when 1/(1 + 2?) is regarded as a function of a complex variable z € C, one sees 
that it has singularities at z = +i, where the denominator vanishes, and | i| = 1, 
which explains why R = 1. 


Example 10.19. The function f : R > R defined by f(0) = 1 and 


x 

—1 
f(a) = £ for 2 £0 
x 
has the power series expansion 
ae | 
= H pn 
P= >L (M+)! ’ 


with infinite radius of convergence. The reciprocal function g : R > R of f is given 
by g(0) = 1 and 


o(a) == for x Æ 0. 
Proposition |10.16|implies that 


gle) = J bna” 
n=0 


has a convergent power series expansion at 0, with bp = 1 and 


bk 
(n=k+1)! 


ba = forn > 1. 


The numbers B, = n!b, are called Bernoulli numbers. They may be defined 
as the coefficients in the power series expansion 


£ Bn n 


The function x/(e” — 1) is called the generating function of the Bernoulli numbers, 
where we adopt the convention that z/(e” — 1) = 1 at x = 0. 


192 10. Power Series 


A number of properties of the Bernoulli numbers follow from their generating 
function. First, we observe that 


x 1 1 (<4) 


ev —1 a 2” 2” | e072 — e-2P2 


is an even function of x. It follows that 
Bo =1, By = —- 


and Bn = 0 for all odd n > 3. Thus, the power series expansion of x/(e”? — 1) has 
the form 

2 Ban 
(2n)! 
The recursion formula for bn can be written in terms of Bn as 


3 (i)a o0, 


k=0 


T 


= x?” 
e= — 1 


bw Sted 
5% 


n=1 


which implies that the Bernoulli numbers are rational. For example, one finds that 
1 1 1 1 5 691 

A Ta ae ee A e 
As the sudden appearance of the large irregular prime number 691 in the numer- 
ator of Bı2 suggests, there is no simple pattern for the values of Bən, although 
they continue to alternate in sign[] The Bernoulli numbers have many surprising 
connections with number theory and other areas of mathematics; for example, as 
noted in Section they give the values of the Riemann zeta function at even 
natural numbers. 


By = 


Using complex analysis, one can show that the radius of convergence of the 
power series for z/(e* — 1) at z = 0 is equal to 27, since the closest zeros of the 


denominator e” — 1 to the origin in the complex plane occur at z = +277, where 
|z| = 2m. Given this fact, the Hadamard formula (Theorem|10.6) implies that 
1/n 1 
lim sup | — = —, 
n= n! 2T 


which shows that at least some of the Bernoulli numbers B, grow very rapidly 
(factorially) as n — oo. 


Finally, we remark that we have proved that algebraic operations on convergent 
power series lead to convergent power series. If one is interested only in the formal 
algebraic properties of power series, and not their convergence, one can introduce 
a purely algebraic structure called the ring of formal power series (over the field R) 


in a variable x, 
R{[2]] = [5 anr” : An E 3 ; 


lA prime number p is said to be irregular if it divides the numerator of Bən, expressed in lowest 
terms, for some 2 < 2n < p — 3; otherwise it is regular. The smallest irregular prime number is 37, 
which divides the numerator of B32 = —7709321041217/5100, since 7709321041217 = 37-683-305065927. 
There are infinitely many irregular primes, and it is conjectured that there are infinitely many regular 
primes. A proof of this conjecture is, however, an open problem. 


10.5. Differentiation of power series 193 


with sums and products on R[[z]] defined in the obvious way: 


oo oo oo 
5 ant” + a byw” = 5 (an F bn) a, 
n=0 n=0 n=0 


(Zo) Eo") 2 (Eon) 


10.5. Differentiation of power series 


We saw in Section that, in general, one cannot differentiate a uniformly con- 
vergent sequence or series. We can, however, differentiate power series, and they 
behaves as nicely as one can imagine in this respect. The sum of a power series 


f(a) = a9 + a£ + azz? + age? + aszt +... 
is infinitely differentiable inside its interval of convergence, and its derivative 
f'(x) = a1 + 2agx + 3a3x? + 4age? +... 


is given by term-by-term differentiation. To prove this result, we first show that 
the term-by-term derivative of a power series has the same radius of convergence 
as the original power series. The idea is that the geometrical decay of the terms of 
the power series inside its radius of convergence dominates the algebraic growth of 
the factor n that comes from taking the derivative. 


Theorem 10.20. Suppose that the power series 


5 An (a — c)” 
n=0 


has radius of convergence R. Then the power series 


co 
5 nan(z —c)”—* 
n=1 


also has radius of convergence R. 


Proof. Assume without loss of generality that c = 0, and suppose |x| < R. Choose 
p such that |z| < p < R, and let 
r= lal O<r<l. 
p 


To estimate the terms in the differentiated power series by the terms in the original 
series, we rewrite their absolute values as follows: 


n—-1 n—-1 
raga] = 2 (EI) ang" = Yano" 


The ratio test shows that the series X` nr"~! converges, since 


ss = lim (EDD an< i; 
noo nr noo n 


lim 


194 10. Power Series 


so the sequence (nr”~') is bounded, by M say. It follows that 
M 
|na,x”—*| < —|a,p"”| for all n EN. 
p 


The series ` |anp"| converges, since p < R, so the comparison test implies that 
So nans”! converges absolutely. 

Conversely, suppose |x| > R. Then )>|anx”| diverges (since X` a,x” diverges) 
and 


1 
|ranx” | > — |anx”| 
|x| 


for n > 1, so the comparison test implies that $` na,x”~! diverges. Thus the series 
have the same radius of convergence. 


Theorem 10.21. Suppose that the power series 
co 
f(x) = So anla- 0)” for |z — c| < R 
n=0 


has radius of convergence R > 0 and sum f. Then f is differentiable in |æ — c| < R 
and 


f'(x) = 5 nan(x —c)"—* for |z — e| < R. 
n=1 


Proof. The term-by-term differentiated power series converges in |x — c| < R by 
Theorem }10.20| We denote its sum by 


g(x“) = 5 nan(z — o)". 


Let 0 < p < R. Then, by Theorem [10.3] the power series for f and g both converge 
uniformly in |z — c| < p. Applying Theorem [9.18]to their partial sums, we conclude 
that f is differentiable in |x — c| < p and f’ = g. Since this holds for every 
0 < p < R, it follows that f is differentiable in |x — c| < R and f’ = g, which proves 
the result. 


Repeated application of T heorem [10.21] implies that the sum of a power series 
is infinitely differentiable inside its interval of convergence and its derivatives are 
given by term-by-term differentiation of the power series. Furthermore, we can get 
an expression for the coefficients an in terms of the function f; they are simply the 
Taylor coefficients of f at c. 


Theorem 10.22. If the power series 
f(2) =) ane- o)” 
n=0 


has radius of convergence R > 0, then f is infinitely differentiable in |z — c| < R 


and 
_ (|) 


n! 


an 


10.6. The exponential function 195 


Proof. We assume c = 0 without loss of generality. Applying Theorem }10.22] to 
the power series 


f(z) = ao + aix + aon? + age? +- + anr” +... 
k times, we find that f has derivatives of every order in |x| < R, and 
f' (2) = a1 + 2age + 3a32? +- + nans"! +..., 
f'(x) = 2a2 + (3 - 2Q)aga +- + n(n —1l)anz™ 7 +... 
f'" (x) = (8-2-1)ag +---+n(n—1)(n—2)anz™ 3 +..., 


n! 
(n—k)! 
where all of these power series have radius of convergence R. Setting x = 0 in these 
series, we get 


f(a) = (kljar +--+ + oN bs, 


ago = f (0), ay = f’(0), tee ak = “p ’ 


which proves the result (after replacing 0 by c). 


One consequence of this result is that power series with different coefficients 
cannot converge to the same sum. 


Corollary 10.23. If two power series 


X anle- 0)”, S > On(z — 0)” 
n=0 n=0 


have nonzero-radius of convergence and are equal in some neighborhood of 0, then 
an = bn for every n = 0,1,2,.... 


Proof. If the common sum in |z — c| < 6 is f(x), we have 


Pa gU 


n! © on! 


An = ry n , 


since the derivatives of f at c are determined by the values of f in an arbitrarily 
small open interval about c, so the coefficients are equal. 


10.6. The exponential function 


We showed in Example that the power series 
| 1 2 1 34 1 nm 4 
E(z)=l+a+ 52 Fa” Hee + SrH... 
has radius of convergence oo. It therefore defines an infinitely differentiable function 
E:R>R. 


Term-by-term differentiation of the power series, which is justified by Theo- 
rem}10.21| implies that 


1 1 
a — E E (n-1) 
E(z)=l+a+ 52 +m" +., 


196 10. Power Series 


so E’ = E. Moreover E(0) = 1. As we show below, there is a unique function 
with these properties, which are shared by the exponential function e”. Thus, 
this power series provides an analytical definition of e” = E(x). All of the other 
familiar properties of the exponential follow from its power-series definition, and 
we will prove a few of them here 


First, we show that e*eY = e*t¥. For the moment, we continue to write the 
function e” as E(x) to emphasise that we use nothing beyond its power series 
definition. 


Proposition 10.24. For every x,y € R, 
E(x)E(y) = E(x +y). 


Proof. We have 


ODDE  BW=D 
j=0 k=0 


Multiplying these series term-by-term and rearranging the sum as a Cauchy prod- 
uct, which is justified by Theorem we get 


B(2)E(y) =Y = ¥ 
j=0 k=0 j!k 
B a S xe Fyk 

n=0 k=0 (n — k)! k! 

From the binomial theorem, 
n gn kyk 1 a n! ceca 1 m 
Gout dem- a 
k=0 = 
Hence, 
(£ +y)” 
E(2)E(y) =) S = Ele +9), 


which proves the result. 


In particular, since £(0) = 1, it follows that 
E(-2) = —~ 


We have E(x) > 0 for all x > 0, since all of the terms in its power series are positive, 
so E(x) > 0 for all z € R. 


Next, we prove that the exponential is characterized by the properties E’ = E 
and (0) = 1. This is a simple uniqueness result for an initial value problem for a 
linear ordinary differential equation. 


Proposition 10.25. Suppose that f : R —> R is a differentiable function such that 
F=f,  f@)=1. 
Then f = E. 


10.7. * Smooth versus analytic functions 197 


Proof. Suppose that f’ = f. Using the equation E” = E, the fact that E is 
nonzero on R, and the quotient rule, we get 


i) By JET 
EJ) BR P ` 
It follows from Theorem [8.34] that f/E is constant on R. Since f(0) = E(0) = 1, 
we have f/E = 1, which implies that f = E. 


In view of this result, we now write E(x) = e”. The following proposition, 
which we use below in Section |10.7.2| shows that e” grows faster than any power 
of £x as £ > oo. 


Proposition 10.26. Suppose that n is a non-negative integer. Then 


n 
. £ 
lim — =0. 
woo eT 


Proof. The terms in the power series of e” are positive for x > 0, so for every 


keEN 
Aar n for all x > 0. 
j=0 
Taking k = n + 1, we get for x > 0 that 
a2” ee (n+ 1)! 
0 = f 
< æ S zD n+ 1)! i 


Since 1/x — 0 as z — ov, the result follows. 


The logarithm log : (0,00) —> R can be defined as the inverse of the exponential 
function exp : R —> (0,00), which is strictly increasing on R since its derivative is 
strictly positive. Having the logarithm and the exponential, we can define the power 
function for all exponents p € R by 


rP = eP beer xz>0. 


Other transcendental functions, such as the trigonometric functions, can be defined 
in terms of their power series, and these can be used to prove their usual properties. 
We will not carry all this out in detail; we just want to emphasize that, once we 
have developed the theory of power series, we can define all of the functions arising 
in elementary calculus from the first principles of analysis. 


10.7. * Smooth versus analytic functions 


The power series theorem, Theorem [10.22] looks similar to Taylor’s theorem, The- 
orem but there is a fundamental difference. Taylor’s theorem gives an ex- 
pression for the error between a function and its Taylor polynomials. No question 
of convergence is involved. On the other hand, Theorem [10.22] asserts the conver- 
gence of an infinite power series to a function f. The coefficients of the Taylor 
polynomials and the power series are the same in both cases, but Taylor’s theorem 
approximates f by its Taylor polynomials P,,(x) of degree n at c in the limit z > c 
with n fixed, while the power series theorem approximates f by P,,(x) in the limit 
n — œ with 7z fixed. 


198 10. Power Series 


10.7.1. Taylor’s theorem and power series. To explain the difference between 
Taylor’s theorem and power series in more detail, we introduce an important dis- 
tinction between smooth and analytic functions: smooth functions have continuous 
derivatives of all orders, while analytic functions are sums of power series. 


Definition 10.27. Let k € N. A function f : (a,b) + R is C* on (a,b), written 
f € C¥*(a,b), if it has continuous derivatives fO?) : (a,b) + R of orders 1 < j < k. 
A function f is smooth (or C®, or infinitely differentiable) on (a,b), written f € 
C™ (a,b), if it has continuous derivatives of all orders on (a,b). 


In fact, if f has derivatives of all orders, then they are automatically continuous, 
since the differentiability of f) implies its continuity; on the other hand, the 
existence of k derivatives of f does not imply the continuity of f). The statement 
“f is smooth” is sometimes used rather loosely to mean “f has as many continuous 
derivatives as we want,” but we will use it to mean that f is C™. 


Definition 10.28. A function f : (a,b) —> R is analytic on (a,b) if for every 
c € (a,b) the function f is the sum in a neighborhood of c of a power series 
centered at c with nonzero radius of convergence. 


Strictly speaking, this is the definition of a real analytic function, and analytic 
functions are complex functions that are sums of power series. Since we consider 
only real functions here, we abbreviate “real analytic” to “analytic.” 


Theorem |10.22|implies that an analytic function is smooth: If f is analytic on 
(a,b) and c € (a,b), then there is an R > 0 and coefficients (an) such that 


f(x) = X an(x — 0)” for |z — c| < R. 
n=0 


Then Theorem implies that f has derivatives of all orders in |z — c| < R, and 
since c € (a,b) is arbitrary, f has derivatives of all orders in (a,b). Moreover, it 
follows that the coefficients an in the power series expansion of f at c are given by 
Taylor’s formula. 


What is less obvious is that a smooth function need not be analytic. If f is 
smooth, then we can define its Taylor coefficients a, = f™ (c)/n! at c for every 
n > 0, and write down the corresponding Taylor series X` a,(a—c)". The problem 
is that the Taylor series may have zero radius of convergence if the derivatives of f 
grow too rapidly as n — oo, in which case it diverges for every x Æ c, or the Taylor 
series may converge, but not to f. 


10.7.2. A smooth, non-analytic function. In this section, we give an example 
of a smooth function that is not the sum of its Taylor series. 


It follows from Proposition |10.26| that if 


k=0 
is any polynomial function, then 
n k 
p(x) _ 
ce 


10.7. * Smooth versus analytic functions 199 


x10" 


0.9 5 ] 
0.8 4.5 
O77 Ei 
3.5 
0.6} 
3 
0.5} 
> > 2.5 
0.4L 
2 
0.3- 
1.5 
0.27 1 
0.4 0.5 
0 0 — 
-1 0 1 a 3 4 5 -0.02 0 0.02 0.04 
x x 
Figure 3. Left: Plot y = (x) of the smooth, non-analytic function defined 


in Proposition [10.29] Right: A detail of the function near x = 0. The dotted 
line is the power-function y = #°/50. The graph of ¢ near 0 is “flatter’ than 
the graph of the power-function, illustrating that ¢(x) goes to zero faster than 
any power of x as x > 0. 


We will use this limit to exhibit a non-zero function that approaches zero faster 
than every power of x as x > 0. As a result, all of its derivatives at 0 vanish, even 
though the function itself does not vanish in any neighborhood of 0. (See Figure3}) 


Proposition 10.29. Define ¢: R > R by 


_ jJexp(—1/x) ifs >, 
w= fz <0. 


Then ¢ has derivatives of all orders on R and 
ọ™(0)=0  foraln> 0. 


Proof. The infinite differentiability of (a) at z 4 0 follows from the chain rule. 
Moreover, its nth derivative has the form 


$a) = Pn(1/x)exp(-1/x) if x > 0, 
0 ifx <0, 


where p,(1/z) is a polynomial of degree 2n in 1/x. This follows, for example, by 
induction, since differentiation of ¢ shows that pn satisfies the recursion relation 


Pnti(2) = 2° [pa(z)—pPr(z)], polz) = 1. 


Thus, we just have to show that ¢ has derivatives of all orders at 0, and that these 
derivatives are equal to zero. 

First, consider ¢'(0). The left derivative ¢'(0~) of ¢ at 0 is 0 since 4(0) = 0 
and ¢(h) = 0 for all h < 0. To find the right derivative, we write 1/h = x and use 


200 10. Power Series 


Proposition |10.26| which gives 


Since both the left and right derivatives equal zero, we have ¢'(0) = 0. 

To show that all the derivatives of ¢ at 0 exist and are zero, we use a proof 
by induction. Suppose that ¢)(0) = 0, which we have verified for n = 1. The 
left derivative (+) (07) is clearly zero, so we just need to prove that the right 
derivative is zero. Using the form of ¢™ (h) for h > 0 and Proposition |10.26} we 
get that 


gD or) oo (h) — 4 o] 


hot h 

— day Pul/h) exp(=1/h) 
hot h 

= lim ©Pn (x 


which proves the result. 


Corollary 10.30. The function ¢:R —> R defined by 


_ Jexp(—l1/e) fa > 0, 
w= if £ <0, 


is smooth but not analytic on R. 


Proof. From Proposition the function ¢ is smooth, and the nth Taylor 
coefficient of ¢ at 0 is an = 0. The Taylor series of ¢ at 0 therefore converges to 
0, so its sum is not equal to ¢ in any neighborhood of 0, meaning that ¢ is not 
analytic at 0. 


The fact that the Taylor polynomial of ¢ at 0 is zero for every degree n € N 
does not contradict Taylor’s theorem, which says that for for every n € N and x > 0 
there exists 0 < € < x such that 
DE) n 


x”. 
n! 


(x) 


Since the derivatives of ¢ are bounded, it follows that there is a constant Ch, 
depending on n, such that 


|d(x)| < Cha” for all 0 < z < œ. 


10.7. * Smooth versus analytic functions 201 


Thus, ¢(2) + 0 as x > 0 faster than any power of x. But this inequality does not 
imply that (x) = 0 for x > 0 since Cn grows rapidly as n increases, and Cn” 4 0 
as n — oo for any x > 0, however small. 


We can construct other smooth, non-analytic functions from ġ. 


Example 10.31. The function 


_ Jexp(-1/z?) ifx 40, 
(a) = t r =D, 


is infinitely differentiable on R, since y(x) = (x?) is a composition of smooth 
functions. 


The function in the next example is useful in many parts of analysis. Before 
giving the example, we introduce some terminology. 


Definition 10.32. A function f : R > R has compact support if there exists R > 0 
such that f(x) = 0 for all z € R with |z| > R. 


It isn’t hard to construct continuous functions with compact support; one ex- 
ample that vanishes for |x| > 1 is the piecewise-linear, triangular (or ‘tent’) function 


= Jl—|e| if |e] <1, 
ro- if |2| > 1. 


By matching left and right derivatives of piecewise-polynomial functions, we can 
similarly construct C! or CF functions with compact support. Using ¢, however, 
we can construct a smooth (C®) function with compact support, which might seem 
unexpected at first sight. 


Example 10.33. The function 


n(z) = ‘cae —x?)] if |a| <1, 


0 if |x| > 1, 


is infinitely differentiable on R, since n(x) = (1 — 2?) is a composition of smooth 
functions. Moreover, it vanishes for |x| > 1, so it is a smooth function with compact 
support. Figure |4| shows its graph. This function is sometimes called a ‘bump’ 
function. 


The function ¢ defined in Proposition [10.29] illustrates that knowing the values 
of a smooth function and all of its derivatives at one point does not tell us anything 
about the values of the function at nearby points. This behavior contrasts with, 
and highlights, the remarkable property of analytic functions that the values of an 
analytic function and all of its derivatives at a single point of an interval determine 
the function on the whole interval. 


We make this principle of analytic continuation precise in the following propo- 
sition. The proof uses a common trick of going from a local result (equality of 
functions in a neighborhood of a point) to a global result (equality of functions 
on the whole of their connected domain) by proving that an appropriate subset is 
open, closed, and non-empty. 


202 10. Power Series 


Figure 4. Plot of the smooth, compactly supported “bump” function defined 


in Example|10.33 


Proposition 10.34. Suppose that f,g : (a,b) > R are analytic functions on an 
open interval (a,b). If f° (c) = g™ (c) for all n > 0 at some point c € (a,b), then 
f =g on (a,b). 


Proof. Let 
E = {x (a,b): f(a) = g™ (x) all n > 0}. 
The continuity of the derivatives f™), g implies that E is closed in (a,b): If 
zk E€ E and zk > x € (a,b), then 
f(x) = lim Ff (ay) = Jim g™ (wx) = 9 (a), 
k—-00 k-00 
so x € E, and E is closed. 

The analyticity of f, g implies that E is open in (a,b): If x € E, then f = g 
in some open interval (x — r,x +r) with r > 0, since both functions have the same 
Taylor coefficients and convergent power series centered at z, so f™ = g™ in 
(x—r,x +r), meaning that (x —r,x +r) cC E, and E is open. 

From Theorem the interval (a,b) is connected, meaning that the only 


subsets that are open and closed in (a,b) are the empty set and the entire interval. 
But E # Ø since c € E, so E = (a,b), which proves the result. 


It is worth noting the choice of the set E in the preceding proof. For example, 
the proof would not work if we try to use the set 


E = {x € (a,b): f(z) = g(£)} 


10.7. * Smooth versus analytic functions 203 


instead of Æ. The continuity of f, g implies that E is closed, but E is not, in 
general, open. 

One particular consequence of Proposition is that a non-zero analytic 
function on R cannot have compact support, since an analytic function on R that 
is equal to zero on any interval (a,b) C R must equal zero on R. Thus, the non- 
analyticity of the ‘bump’-function 7 in Example [10.33] is essential. 


es 
Chapter 11 


The Riemann Integral 


I know of some universities in England where the Lebesgue integral is 
taught in the first year of a mathematics degree instead of the Riemann 
integral, but I know of no universities in England where students learn 
the Lebesgue integral in the first year of a mathematics degree. (Ap- 
proximate quotation attributed to T. W. Korner) 


Let f : [a,b] — R be a bounded (not necessarily continuous) function on a 
compact (closed, bounded) interval. We will define what it means for f to be 
Riemann integrable on [a,b] and, in that case, define its Riemann integral f? fi 
The integral of f on [a,b] is a real number whose geometrical interpretation is the 
signed area under the graph y = f(x) for a < x < b. This number is also called 
the definite integral of f. By integrating f over an interval [a, x] with varying right 
end-point, we get a function of x, called an indefinite integral of f. 


The most important result about integration is the fundamental theorem of 
calculus, which states that integration and differentiation are inverse operations in 
an appropriately understood sense. Among other things, this connection enables 
us to compute many integrals explicitly. We will prove the fundamental theorem in 
the next chapter. In this chapter, we define the Riemann integral and prove some 
of its basic properties. 


Integrability is a less restrictive condition on a function than differentiabil- 
ity. Generally speaking, integration makes functions smoother, while differentiation 
makes functions rougher. For example, the indefinite integral of every continuous 
function exists and is differentiable, whereas the derivative of a continuous function 
need not exist (and typically doesn’t). 


The Riemann integral is the simplest integral to define, and it allows one to 
integrate every continuous function as well as some not-too-badly discontinuous 
functions. There are, however, many other types of integrals, the most important 
of which is the Lebesgue integral. The Lebesgue integral allows one to integrate 
unbounded or highly discontinuous functions whose Riemann integrals do not exist, 


205 


206 11. The Riemann Integral 


and it has better mathematical properties than the Riemann integral. The defini- 
tion of the Lebesgue integral is more involved, requiring the use of measure theory, 
and we will not discuss it here. In any event, the Riemann integral is adequate for 
many purposes, and even if one needs the Lebesgue integral, it is best to understand 
the Riemann integral first. 


11.1. The supremum and infimum of functions 


In this section we collect some results about the supremum and infimum of functions 
that we use to study Riemann integration. These results can be referred back to as 
needed. 

From Definition [6.11] the supremum or infimum of a function is the supremum 
or infimum of its range, and results about the supremum or infimum of sets translate 
immediately to results about functions. There are, however, a few differences, which 
come from the fact that we often compare the values of functions at the same point, 
rather than all of their values simultaneously. 


Inequalities and operations on functions are defined pointwise as usual; for 
example, if f,g: A— R, then f < g means that f(x) < g(a) for every x € A, and 
f+g:A-— R is defined by (f + g)(x) = f(x) + g(x). 


Proposition 11.1. Suppose that f,g : A —> R and f < g. Then 
< inf f < inf g. 
sup f < supg, inf f < inf g 
Proof. If supg = ov, then sup f < supg. Otherwise, if f < g and g is bounded 
from above, then 
f(x) < g(x) < supg for every x € A. 
A 


Thus, f is bounded from above by sup, g, so sup, f < sup, g. Similarly, —f > —g 
implies that sup4(— f) > sup4(—g), so infa f < inf 4g. 


Note that f < g does not imply that sup, f < inf4 g; to get that conclusion, 
we need to know that f(x) < gly) for all z,y € A and use Proposition [2.24] 


Example 11.2. Define f,g : [0,1] > R by f(x) = 2x, g(x) = 2x +1. Then f <g 
and 

supf=2, inf f =0, supg=3, inf g=1. 

[0,1] [0,1] [0,1] [0,1] 


Thus, sup f > inf g even though f < g. 
As for sets, the supremum and infimum of functions do not, in general, preserve 


strict inequalities, and a function need not attain its supremum or infimum even if 
it exists. 


Example 11.3. Define f : [0,1] > R by 
x if0<a<1, 
ie l ife=1. 


Then f < 1 on [0,1] but supo] f = 1, and there is no point x € [0,1] such that 
f(a) =1. 


11.1. The supremum and infimum of functions 207 


Next, we consider the supremum and infimum of linear combinations of func- 
tions. Multiplication of a function by a positive constant multiplies the inf or sup, 
while multiplication by a negative constant switches the inf and sup, 


Proposition 11.4. Suppose that f : A — R is a bounded function and c € R. If 
c > 0, then 
supcf = csup f, inf cf = cinf f. 
A A A A 
If c < 0, then 
supcf = cinf f, inf cf = csup f. 
A A A A 


Proof. Apply Proposition [2.23] to the set {cf(x):a€ A} =c{f(ax): a € A}. 
For sums of functions, we get an inequality. 
Proposition 11.5. If f,g : A — R are bounded functions, then 
sup(f +9) < spf etd, inf(f +g) > inf f + inf g. 
Proof. Since f(x) < sup, f and g(x) < sup, g for every x € [a,b], we have 
Pera) < sup f Pepy: 
Thus, f + g is bounded from above by sup, f + sup 4g, so 
sup(f +g) < sup f + supg. 
A A A 


The proof for the infimum is analogous (or apply the result for the supremum to 
the functions — f, —g). 


We may have strict inequality in Proposition because f and g may take 
values close to their suprema (or infima) at different points. 


Example 11.6. Define f,g : [0,1] > R by f(x) = zx, g(x) = 1 -— x. Then 


sup f = sup g = sup( f + g) = 1, 
[0,1] [0,1] [0,1] 


so sup(f + g) = 1 but sup f + supg = 2. Here, f attains its supremum at 1, while 
g attains its supremum at 0. 


Finally, we prove some inequalities that involve the absolute value. 


Proposition 11.7. If f,g : A — R are bounded functions, then 


sup f — sup g] < sup|f — gl, inf f — inf g| < sup|f —gl. 
A A A A A A 


Proof. Since f = f — g +g and f — g < |f — g|, we get from Proposition and 
Proposition that 


sup f < sup( f — g) + supg < sup |f — g| + supg, 
A A A A A 


so 
sup f —supg < sup|f — gl. 
A A A 


208 11. The Riemann Integral 


Exchanging f and g in this inequality, we get 
sup g — sup f < sup|f — gl, 
A A A 


which implies that 


sup f — supg| < sup|f — gl. 
A A A 


Replacing f by —f and g by —g in this inequality, we get 


inf f — inf g| < — 
inf f lo Suny g|, 


where we use the fact that sup(— f) = — inf f. 


Proposition 11.8. If f,g : A — R are bounded functions such that 
F(=) — FY) < |g) -gy)| for all z,y € A, 


then 
sup finty < mpy =mi g 
Proof. The condition implies that for all x,y € A, we have 
F(a) — Fy) S loa) — 9(y)| = max [9(2), e min [9(x), 9(y)] < sup g — inf 9, 

which implies that 

sup{ f(x) — fly): x,y € A} < sup g — inf g. 
From Proposition [2.24] we have 

sup{ f(x) — f(y): x,y E€ A} = sup f — int f, 


so the result follows. 


11.2. Definition of the integral 


The definition of the integral is more involved than the definition of the derivative. 
The derivative is approximated by difference quotients, whereas the integral is 
approximated by upper and lower sums based on a partition of an interval. 

We say that two intervals are almost disjoint if they are disjoint or intersect 
only at a common endpoint. For example, the intervals [0,1] and [1,3] are almost 
disjoint, whereas the intervals [0,2] and [1,3] are not. 


Definition 11.9. Let I be a nonempty, compact interval. A partition of I is a 
finite collection {7;, I2, ..., In} of almost disjoint, nonempty, compact subintervals 
whose union is T. 


A partition of [a,b] with subintervals I, = [v,-1, £k] is determined by the set 
of endpoints of the intervals 
a = £to < T1 < T2 <- < En—1 < Tn =b. 
Abusing notation, we will denote a partition P either by its intervals 
P={h,lo,...,In} 


11.2. Definition of the integral 209 


or by the set of endpoints of the intervals 
P= {Xo, 01, T2, <- - Ün—1, Try: 


We’ll adopt either notation as convenient; the context should make it clear which 
one is being used. There is always one more endpoint than interval. 


Example 11.10. The set of intervals 
{[0, 1/5}, (1/5, 1/4], [1/4, 1/3], [1/3, 1/2], [1/2, 1]} 
is a partition of [0,1]. The corresponding set of endpoints is 


{0, 1/5, 1/4, 1/3, 1/2, 1}. 


We denote the length of an interval I = [a,b] by 
\I| =b—a. 


Note that the sum of the lengths |I| = £k — £k—1 of the almost disjoint subintervals 
in a partition {J), I2,...,I,} of an interval I is equal to length of the whole interval. 
This is obvious geometrically; algebraically, it follows from the telescoping series 


n 


So Mel = So (ee — zr) 
k=1 


k=1 
= Tn — Ln-1 + Lyn-1 — Ln-g +++ + T2 — £1 + T1 — To 
= Tn — To 
= [J]. 


Suppose that f : [a,b] — R is a bounded function on the compact interval 
I = [a,b] with 
M=supf, m = inf f. 
I I 
If P = {h, lo,...,I,} is a partition of I, let 
Mpk = sup f, mr = inf f. 
Ik Ik 
These suprema and infima are well-defined, finite real numbers since f is bounded. 
Moreover, 
m < Mk < Mk < M. 


If f is continuous on the interval J, then it is bounded and attains its maximum 
and minimum values on each subinterval, but a bounded discontinuous function 
need not attain its supremum or infimum. 


We define the upper Riemann sum of f with respect to the partition P by 
n n 
U(f;P) = XO M| = XO My (xe =p) 
k=1 k=1 
and the lower Riemann sum of f with respect to the partition P by 


L(f; P) = >> mali] =D) me (ee — 4-1). 
k=1 k=1 


210 11. The Riemann Integral 


Geometrically, U(f; P) is the sum of the signed areas of rectangles based on the 
intervals Jẹ that lie above the graph of f, and L(f; P) is the sum of the signed 
areas of rectangles that lie below the graph of f. Note that 


m(b— a) < L(f; P) < U(f; P) < M(b— a). 


Let II(a, b), or II for short, denote the collection of all partitions of [a,b]. We 
define the upper Riemann integral of f on [a,b] by 


U(f) = inf U(f; P). 


The set {U (f; P) : P € II} of all upper Riemann sums of f is bounded from 
below by m(b — a), so this infimum is well-defined and finite. Similarly, the set 
{L(f;P) : P € IT} of all lower Riemann sums is bounded from above by M(b— a), 
and we define the lower Riemann integral of f on [a, b] by 


L(f) = sup L(f; P). 
Pell 


These upper and lower sums and integrals depend on the interval |a, b] as well as the 
function f, but to simplify the notation we won’t show this explicitly. A commonly 
used alternative notation for the upper and lower integrals is 


vine fs un= fs 


Note the use of “lower-upper” and “upper-lower” approximations for the in- 
tegrals: we take the infimum of the upper sums and the supremum of the lower 
sums. As we show in Proposition [11.23] below, we always have L(f) < U(f), but 
in general the upper and lower integrals need not be equal. We define Riemann 
integrability by their equality. 


Definition 11.11. A function f : [a,b] —> R is Riemann integrable on [a,b] if it 
is bounded and its upper integral U(f) and lower integral L(f) are equal. In that 
case, the Riemann integral of f on [a,b], denoted by 


I io | f Ja f 


or similar notations, is the common value of U(f) and L(f). 


An unbounded function is not Riemann integrable. In the following, “inte- 
grable” will mean “Riemann integrable, and “integral” will mean “Riemann inte- 
gral” unless stated explicitly otherwise. 


11.2.1. Examples. Let us illustrate the definition of Riemann integrability 
with a number of examples. 


Example 11.12. Define f : [0,1] > R by 


1/x if0<a<l, 
o=; if 2 =0. 


1 

1 
fa 
o Tt 


Then 


11.2. Definition of the integral 211 


isn’t defined as a Riemann integral becuase f is unbounded. In fact, if 
0 < z1 < T2 < +: < Tn-1 <1 


is a partition of [0, 1], then 


sup f = œ, 
[0,21] 


so the upper Riemann sums of f are not well-defined. 


An integral with an unbounded interval of integration, such as 


aan 
J — dz, 
1 g 


also isn’t defined as a Riemann integral. In this case, a partition of |1, 00) into 
finitely many intervals contains at least one unbounded interval, so the correspond- 
ing Riemann sum is not well-defined. A partition of [1, o0) into bounded intervals 
(for example, Ip = [k,k +1] with k € N) gives an infinite series rather than a finite 
Riemann sum, leading to questions of convergence. 


One can interpret the integrals in this example as limits of Riemann integrals, 
or improper Riemann integrals, 
1 1 fore) r 
1 1 1 
| — dx = lim — dz, J — dz = lim — dz, 
0 T e>0+ € x 1 ax roo 1 T 
but these are not proper Riemann integrals in the sense of Definition |11.11| Such 
improper Riemann integrals involve two limits — a limit of Riemann sums to de- 
fine the Riemann integrals, followed by a limit of Riemann integrals. Both of the 
improper integrals in this example diverge to infinity. (See Section |12.4}) 


Next, we consider some examples of bounded functions on compact intervals. 


Example 11.13. The constant function f(x) = 1 on [0,1] is Riemann integrable, 


and i 
f ldz =1. 
0 


To show this, let P = {fh, I2,...,I,} be any partition of [0,1] with endpoints 
{0, 21, £2, ... 3 Un—-1) 1}. 


Since f is constant, 


and therefore 


U(f; P) = L(f; P) = Ñ le, £k-1) = tn — 20 = 1. 
k=1 
Geometrically, this equation is the obvious fact that the sum of the areas of the 
rectangles over (or, equivalently, under) the graph of a constant function is exactly 
equal to the area under the graph. Thus, every upper and lower sum of f on [0,1] 
is equal to 1, which implies that the upper and lower integrals 


U(f) = int UGP) = inf} =1, Lf) = sup LCS; P) = sup{1} = 1 


are equal, and the integral of f is 1. 


212 11. The Riemann Integral 


More generally, the same argument shows that every constant function f(x) = c 


is integrable and 
b 
/ cdx = c(b — a). 


The following is an example of a discontinuous function that is Riemann integrable. 
Example 11.14. The function 


0 if0<r<1 
ry ={" ifr=0 


f tæ=o. 


To show this, let P = {h, I2,..., In} be a partition of [0,1]. Then, since f(x) = 0 
for x > 0, 


is Riemann integrable, and 


Mk =supf=0, mk = inf f =0 for k = 2,...,n. 
Ty Ik 


The first interval in the partition is J, = [0,21], where 0 < zı < 1, and 
Mı =1, mı =0, 
since f(0) = 1 and f(x) = 0 for 0 < x < xı. It follows that 
U(f;sP)=a1, L(f;P)=0. 
Thus, L(f) = 0 and 
U(f) = inf{x1 :0 < zı <1} =0, 
so U(f) = L(f) = 0 are equal, and the integral of f is 0. In this example, the 


infimum of the upper Riemann sums is not attained and U(f; P) > U(f) for every 
partition P. 


A similar argument shows that a function f : [a,b] > R that is zero except at 
finitely many points in fa, b] is Riemann integrable with integral 0. 

The next example is a bounded function on a compact interval whose Riemann 
integral doesn’t exist. 


Example 11.15. The Dirichlet function f : [0,1] — R is defined by 


1 ifxe[0,1] AQ, 
E a 
0 ifx e [0,1] \Q. 
That is, f is one at every rational number and zero at every irrational number. 


This function is not Riemann integrable. If P = {1, I2,..., In} is a partition 

of [0,1], then 
Mpk = sup f = 1, mk = inf = 0, 
Tk Ik 
since every interval of non-zero length contains both rational and irrational num- 
bers. It follows that 
U(f;P)=1, L(f;P)=0 

for every partition P of [0,1], so U(f) = 1 and L(f) = 0 are not equal. 


11.2. Definition of the integral 213 


The Dirichlet function is discontinuous at every point of [0,1], and the moral 
of the last example is that the Riemann integral of a highly discontinuous function 
need not exist. Nevertheless, some fairly discontinuous functions are still Riemann 
integrable. 


Example 11.16. The Thomae function defined in Example is Riemann inte- 
grable. The proof is left as an exercise. 


Theorem|11.58}and Theorem|11.61)] below give precise statements of the extent 
to which a Riemann integrable function can be discontinuous. 


11.2.2. Refinements of partitions. As the previous examples illustrate, a di- 
rect verification of integrability from Definition[11.11]is unwieldy even for the sim- 
plest functions because we have to consider all possible partitions of the interval 
of integration. To give an effective analysis of Riemann integrability, we need to 
study how upper and lower sums behave under the refinement of partitions. 


Definition 11.17. A partition Q = {J1, Jo,...,Jm} is a refinement of a partition 
P= {h,Ib,...,I,} if every interval J, in P is an almost disjoint union of one or 
more intervals Je in Q. 


Equivalently, if we represent partitions by their endpoints, then Q is a refine- 
ment of P if Q D P, meaning that every endpoint of P is an endpoint of Q. We 
don’t require that every interval — or even any interval — in a partition has to be 
split into smaller intervals to obtain a refinement; for example, every partition is a 
refinement of itself. 


Example 11.18. Consider the partitions of [0,1] with endpoints 
P = {0,1/2, 1}, Q = {0, 1/3, 2/3, 1}, R = {0, 1/4, 1/2, 3/4, 1}. 


Thus, P, Q, and R partition [0, 1] into intervals of equal length 1/2, 1/3, and 1/4, 
respectively. Then Q is not a refinement of P but R is a refinement of P. 


Given two partitions, neither one need be a refinement of the other. However, 
two partitions P, Q always have a common refinement; the smallest one is R = 
P UQ, meaning that the endpoints of R are exactly the endpoints of P or Q (or 
both). 


Example 11.19. Let P = {0, 1/2, 1} and Q = {0, 1/3, 2/3, 1}, as in Example[11.18| 
Then Q isn’t a refinement of P and P isn’t a refinement of Q. The partition 
S = PUQ, or 
S = {0,1/3, 1/2, 2/3, 1}, 
is a refinement of both P and Q. The partition S is not a refinement of R, but 
T= RUS, or 
T = {0, 1/4, 1/3, 1/2, 2/3, 3/4, 1}, 


is a common refinement of all of the partitions {P, Q, R, S}. 


As we show next, refining partitions decreases upper sums and increases lower 
sums. (The proof is easier to understand than it is to write out — draw a picture!) 


214 11. The Riemann Integral 


Theorem 11.20. Suppose that f : [a,b] > R is bounded, P is a partitions of |a, b], 
and Q is refinement of P. Then 


Of OVS U(f; P), L(f; P) < L(f;Q). 


Proof. Let 

P={h,lo,...,In}, Q = {Ji, J2,..., Jm} 
be partitions of [a,b], where Q is a refinement of P, som > n. We list the intervals 
in increasing order of their endpoints. Define 


Mpk = sup f, mk = inf f, M; = sup f, m, =inf f. 
Ik Ik Je Je 


Since Q is a refinement of P, each interval J, in P is an almost disjoint union of 
intervals in Q, which we can write as 


dk 
Ik = U Je 
l=pr 
for some indices pk < qk. If pe < qk, then J; is split into two or more smaller 
intervals in Q, and if pk = qk, then J, belongs to both P and Q. Since the intervals 
are listed in order, we have 


pi = 1, Pk+1 = qk +1, Qn =m. 
If pk < L < qk, then Je C Ik, so 
M; < Mk, Mge > my for pk < L < qr- 


Using the fact that the sum of the lengths of the J-intervals is the length of the 
corresponding J-interval, we get that 


dk dk dk 
XO MilJel < XO M| Je] = Mx X [Jel = Malle. 
L=pk L=pk L=pk 
It follows that 


m n qk n 
U(f; Q) = Ñ MilJe => XO Mae] < XD Mele] = U (F; P). 
t=1 k=1 l=pr k=1 
Similarly, 
dk dk 
XO mill > X mll = meal, 
L=Ppk L=pk 
and J 
n k n 
LQ) =X Y mild > X` melTel = LF; P), 
k=1 l=pp k=1 


which proves the result. 


It follows from this theorem that all lower sums are less than or equal to all 
upper sums, not just the lower and upper sums associated with the same partition. 


Proposition 11.21. If f : [a,b] — R is bounded and P, Q are partitions of |a, b], 
then 
L(f; P) < U(f; Q). 


11.3. The Cauchy criterion for integrability 215 


Proof. Let R be a common refinement of P and Q. Then, by Theorem}11.20 
L(f; P) < L(f; R), U(f; R) < U(f;Q). 
It follows that 
Lf; P) < L(f; R) < U(f; R) < OF; Q). 


An immediate consequence of this result is that the lower integral is always less 
than or equal to the upper integral. 


Proposition 11.22. If f : [a,b] — R is bounded, then 
L(f) < U(f). 


Proof. Let 
A={L(f;P):P €I}, B = {U(f; P): P € II}. 


From Proposition [11.21] L < U for every L € A and U € B, so Proposition [2.22] 
implies that sup A < inf B, or L(f) < U(f). 


11.3. The Cauchy criterion for integrability 


The following theorem gives a criterion for integrability that is analogous to the 
Cauchy condition for the convergence of a sequence. 


Theorem 11.23. A bounded function f : [a,b] + R is Riemann integrable if and 
only if for every € > 0 there exists a partition P of [a,b], which may depend on e, 
such that 

U(f; P) — L(f; P) < e. 


Proof. First, suppose that the condition holds. Let e > 0 and choose a partition 
P that satisfies the condition. Then, since U(f) < U(f; P) and L(f; P) < L(f), 
we have 
0 < U(f) — L(F) < U(f; P) — L(f; P) < e. 

Since this inequality holds for every « > 0, we must have U(f) — L(f) = 0, and f 
is integrable. 

Conversely, suppose that f is integrable. Given any e > 0, there are partitions 
Q, R such that 

€ € 
URQ)<UA+S, LAR) > L(A) -É 

Let P be a common refinement of Q and R. Then, by Theorem}11.20} 


U(f;P) —L(f;P) < U(F;Q) — LR) < U(E) — Lif) +e 
Since U(f) = L(f), the condition follows. 


If U(f; P) — L(f;P) < e, then U(f;Q) — L(f;Q) < € for every refinement Q 
of P, so the Cauchy condition means that a function is integrable if and only if 
its upper and lower sums get arbitrarily close together for all sufficiently refined 
partitions. 


216 11. The Riemann Integral 


It is worth considering in more detail what the Cauchy condition in Theo- 
rem|11.23]implies about the behavior of a Riemann integrable function. 


Definition 11.24. The oscillation of a bounded function f on a set A is 


ose f = sup — inf f. 
If f : [a,b] > R is bounded and P = {h, I2,..., In} is a partition of [a,b], then 


U(f; P) — L(f; P) = X sup f - Me] — XO inf f - Zn] = X ose f - xl. 
k=1 ™ pea ai 

A function f is Riemann integrable if we can make U(f; P) — L(f; P) as small as 
we wish. This is the case if we can find a sufficiently refined partition P such that 
the oscillation of f on most intervals is arbitrarily small, and the sum of the lengths 
of the remaining intervals (where the oscillation of f is large) is arbitrarily small. 
For example, the discontinuous function in Example [11.14] has zero oscillation on 
every interval except the first one, where the function has oscillation one, but the 
length of that interval can be made as small as we wish. 


Thus, roughly speaking, a function is Riemann integrable if it oscillates by an 
arbitrary small amount except on a finite collection of intervals whose total length 
is arbitrarily small. Theorem|11.58] gives a precise statement. 


One direct consequence of the Cauchy criterion is that a function is integrable 
if we can estimate its oscillation by the oscillation of an integrable function. 


Proposition 11.25. Suppose that f,g : [a,b] > R are bounded functions and g is 
integrable on [a,b]. If there exists a constant C > 0 such that 


<C 
osc f< Ose g 
on every interval I C [a,b], then f is integrable. 


Proof. If P = {1, I2,...,In} is a partition of [a,b], then 
U (f; P) — L (f; P) = > osc f - Il 
k=1 
< f 
< CD99 W| 


< C [U (g; P) = L(g; P)]. 


Thus, f satisfies the Cauchy criterion in Theorem|11.23]if g does, which proves that 
f is integrable if g is integrable. 


We can also use the Cauchy criterion to give a sequential characterization of 
integrability. 


Theorem 11.26. A bounded function f : [a,b] + R is Riemann integrable if and 
only if there is a sequence (P,,) of partitions such that 


11.3. The Cauchy criterion for integrability 217 


In that case, 
b 
I f= lim U(f; Pn) = lim L(f; Ph). 
a noo noo 


Proof. First, suppose that the condition holds. Then, given e > 0, there is an 
n € N such that U(f;P,) — L(f;Pa) < €, so Theorem |11.23] implies that f is 
integrable and U(f) = L(f). 


Furthermore, since U(f) < U(f; Pa) and L(f; Pa) < L(f), we have 


Since the limit of the right-hand side is zero, the ‘squeeze’ theorem implies that 


b 
lim U(f; Pa) = U(f) = / i 


n— o0 


It also follows that 
b 
Jim LCF; Pa) = lim U(f; Pn) — lim [U(f; Pa) — LC; Prd] -f Í. 


Conversely, if f is integrable then, by Theorem |11.23| for every n € N there 
exists a partition P, such that 


0 < U(f; Pa) — Lf Pa) < 


? 


Sle 


and U(f; P,) — L(f; Pa) > 0 as n > œ. 


Note that if the limits of U(f; Pa) and L(f; Pa) both exist and are equal, then 
dim [U(f; Pa) — LCF Pa) = dim UC; Pn) — lim LF; Pa), 

so the conditions of the theorem are satisfied. Conversely, the proof of the theorem 

shows that if the limit of U(f;P,) — L(f;P,) is zero, then the limits of U(f; Pa) 


and L(f;P,,) both exist and are equal. This isn’t true for general sequences, where 
one may have lim(a, — bn) = 0 even though liman and lim bn, don’t exist. 


Theorem |11.26] provides one way to prove the existence of an integral and, in 
some cases, evaluate it. 


Example 11.27. Consider the function f(x) = x? on [0,1]. Let P, be the partition 
of [0,1] into n-intervals of equal length 1/n with endpoints 7, = k/n for k = 
0,1,2,...,n. If Ip = [(k — 1)/n, k/n] is the kth interval, then 
sup f = 2%, inf f = ai 

k 
since f is increasing. Using the formula for the sum of squares 


ys inin +1)(2n +1), 


we get 


218 11. The Riemann Integral 


Upper Riemann Sum =0.44 
1 r r 


0 0.2 0.4 0.6 0.8 1 
x 

Lower Riemann Sum =0.24 

1 T T r r 


Upper Riemann Sum =0.385 
1 r r r r 


0 0.2 0.4 0.6 0.8 1 
x 

Lower Riemann Sum =0.285 

1 r r r r 


Upper Riemann Sum =0.3434 
1 T T r r 


0 0.2 0.4 0.6 0.8 1 
x 

Lower Riemann Sum =0.3234 

1 T r r r 


Figure 1. Upper and lower Riemann sums for Example |11.27| with n = 
5, 10, 50 subintervals of equal length. 


11.4. Continuous and monotonic functions 219 


and 


n 


n—-1 
1 1 1 1 1 
s e 2 .— = — 2 = 
LF Pa) =) thr n në 2 6 ( 3 (2 =) l 


k=1 


(See Figure|11.27}) It follows that 


` . == ` . — 1 


n— o0 


and Theorem|11.26|implies that x? is integrable on [0,1] with 


1 
1 

J x? dz = =. 
0 3 


The fundamental theorem of calculus, Theorem below, provides a much easier 
way to evaluate this integral, but the Riemann sums provide the basic definition of 
the integral. 


11.4. Continuous and monotonic functions 


The Cauchy criterion leads to the following fundamental result that every contin- 
uous function is Riemann integrable. To prove this result, we use the fact that a 
continuous function oscillates by an arbitrarily small amount on every interval of a 
sufficiently refined partition. 


Theorem 11.28. A continuous function f : [a,b] > R on a compact interval is 
Riemann integrable. 


Proof. A continuous function on a compact set is bounded, so we just need to 
verify the Cauchy condition in Theorem|11.23 


Let € > 0. A continuous function on a compact set is uniformly continuous, so 
there exists 6 > 0 such that 


F(x) — Fly) < = for all x,y € [a,b] such that |z — y| < 6. 


Choose a partition P = {l, I2,..., In} of [a,b] such that |I| < 6 for every k; for 
example, we can take n intervals of equal length (b — a)/n with n > (b — a)/ð. 
Since f is continuous, it attains its maximum and minimum values M; and 
my on the compact interval J; at points x, and ypx in Ik. These points satisfy 
[Ek — Yk| < 4, so 
€ 
b-a 


My — mg = f (£k) — f (ye) < 


220 11. The Riemann Integral 


The upper and lower sums of f therefore satisfy 


U(f; P) — L(f;P) = >> Mr Irl — X mel Tel 
k=1 k=1 


= $ (Mr — mi )|Ikl 


k=1 
€ n 
Leo Ik 
3 
k=1 
<e, 


and Theorem|11.23|implies that f is integrable. 


Example 11.29. The function f(x) = x? on [0,1] considered in Example |11.27|is 
integrable since it is continuous. 


Another class of integrable functions consists of monotonic (increasing or de- 
creasing) functions. 


Theorem 11.30. A monotonic function f : [a,b] + R on a compact interval is 
Riemann integrable. 


Proof. Suppose that f is monotonic increasing, meaning that f(x) < f(y) for x < 


y. Let Pa = {h, lo,...,In} be a partition of [a,b] into n intervals Ip = [£k-1, £k], 
of equal length (b — a)/n, with endpoints 
k 
zk =at(b—a)-, k=0,1,...,n. 
n 


Since f is increasing, 
Mp = sup f = f (zx), my = inf f = f(2k-1). 
Ik k 


Hence, summing a telescoping series, we get 


n 


U(f; Pa) — L(U; Pa) = X. (Mr — mx) (ae — 24-1) 


k=1 
=$ Fax) — fer) 
k=1 
= 24 O) -= F(a) 


It follows that U(f; Pn) — L(U; Pa) > 0 as n —> oo, and Theorem |11.26] implies 
that f is integrable. 


The proof for a monotonic decreasing function f is similar, with 
sup f=f(tr-1), inf f = f (ze), 
Ik Ik 


or we can apply the result for increasing functions to — f and use Theorem {11.32 
below. 


11.4. Continuous and monotonic functions 221 


1 
a nal 
ed 
0.8} ] 
a 
> 0.6 a | 
0.4} | 
0.2} | 
r i l i ; 
0 0.2 0.4 0.6 0.8 


Figure 2. The graph of the monotonic function in Example [11.31] with a 
countably infinite, dense set of jump discontinuities. 


Monotonic functions needn’t be continuous, and they may be discontinuous at 
a countably infinite number of points. 


Example 11.31. Let {qx : k € N} be an enumeration of the rational numbers in 
[0, 1) and let (ax) be a sequence of strictly positive real numbers such that 


5 ak = iL 
k=1 
Define f : [0,1] > R by 
f= So am, Qe) ={kEN: a € [0,2)}. 
kEQ(a) 
for x > 0, and f(0) = 0. That is, f(x) is obtained by summing the terms in the 
series whose indices k correspond to the rational numbers such that 0 < qk < a. 


For « = 1, this sum includes all the terms in the series, so f(1) = 1. For 
every 0 < x < 1, there are infinitely many terms in the sum, since the rationals 
are dense in [0, x), and f is increasing, since the number of terms increases with x. 
By Theorem |11.30| f is Riemann integrable on [0,1]. Although f is integrable, it 
has a countably infinite number of jump discontinuities at every rational number 
in [0,1), which are dense in [0,1], The function is continuous elsewhere (the proof 
is left as an exercise). 

Figure [2] shows the graph of f corresponding to the enumeration 

{0, 1/2, 1/3, 2/3, 1/4, 3/4, 1/5, 2/5, 3/5, 4/5, 1/6,5/6,1/7,...} 


of the rational numbers in [0, 1) and 


What is its Riemann integral? 


222 11. The Riemann Integral 


11.5. Linearity, monotonicity, and additivity 


The integral has the following three basic properties. 


fozes, [uro-fs+ fio 


(2) Monotonicity: if f < g, then 


b b 
Hees g- 
(3) Additivity: if a < c < b, then 
c b b 
Sa 


These properties are analogous to the corresponding properties of sums (or 
convergent series): 


(1) Linearity: 


n n n n n 
X car =cX_ ar, X (ax + bx) =X an + >) br; 
k=1 k=1 k=1 k=1 k=1 

nN N 
Sian < So be if ap < bk; 
k=1 k=1 

m n n 
5 ap + 5 an = yo ak 
k=1 k=m+1 k=1 


In this section, we prove these properties and derive a few of their consequences. 


11.5.1. Linearity. We begin by proving the linearity. First we prove linearity 
with respect to scalar multiplication and then linearity with respect to sums. 


Theorem 11.32. If f : [a,b] — R is integrable and c € R, then cf is integrable 


and 
foteef's 


Proof. Suppose that c > 0. Then for any set A C [a,b], we have 
supcf = csup f, inf cf = cinf f, 
A A A A 


so U(cf; P) = cU(f; P) for every partition P. Taking the infimum over the set I 
of all partitions of [a, b], we get 


U(ef) = inf U(cf; P) = inf cU(f; P) = c inf U(f; P) = UC), 
Similarly, L(cf; P) = cL(f; P) and L(cf) = cL(f). If f is integrable, then 
U(cf) = cU (f) = cL(f) = L(cf), 


which shows that cf is integrable and 


f oi=ef i 


11.5. Linearity, monotonicity, and additivity 223 


Now consider —f. Since 
sup(—f) =—inff, — inf(—f) = UP Í, 
we have 
U(-f;P)=-L(f;P),  L(-f; P)=-U(f;P). 
Therefore 
U(—f) = iaf U(f; P) = inf [E(f P)] = — sup EAS P)=—L(f), 


L(-f)= op UE) =e [-U(f;P)] = — inf Ul; P) = —U(f). 


Hence, — f is integrable if f is integrable and 


Laai 


Finally, if c < 0, then c = —|c|, and a successive application of the previous results 
shows that cf is integrable with f? cf = cf? f. 


Next, we prove the linearity of the integral with respect to sums. If f, g are 
bounded, then f + g is bounded and 


sup(f +g) < sup f +supg, inf(f +g) > inf f+ inf g. 
I rA I I 4 I 
It follows that 
+ < + 
ose(f + g) < ose f + ose g, 


so f+g is integrable if f, g are integrable. In general, however, the upper (or lower) 
sum of f + g needn’t be the sum of the corresponding upper (or lower) sums of f 
and g. As a result, we don’t get 


[eraf tfo 


simply by adding upper and lower sums. Instead, we prove this equality by esti- 
mating the upper and lower integrals of f + g from above and below by those of f 
and g. 


Theorem 11.33. If f,g : [a,b] — R are integrable functions, then f + g is inte- 


grable, and 
[araf irf 


Proof. We first prove that if f,g : [a,b] + R are bounded, but not necessarily 
integrable, then 


U(f +9) <SU(F)+U(9),  L(f +9) 2 L(f) + L(y). 


224 11. The Riemann Integral 


Suppose that P = {l), lo,..., In} is a partition of [a,b]. Then 


U(f+g;P)= > sup(f +g) Hk] 
k=1 “k 


n n 
< X` sup f [In| +X supg- Ml 
k=1 7k k=1 7k 


< U(f; P) + U(g; P). 


Let e > 0. Since the upper integral is the infimum of the upper sums, there are 
partitions Q, R such that 


€ € 
UQ) <U) +5, UWR) <U) +5, 

and if P is a common refinement of Q and R, then 
€ € 
UFP)<Uf)+5, UlgsP)<U(9) +5. 


It follows that 
U(f +g) < U(f +g; P) SUF; P) +U(g; P) <U(f) + U(g) +e. 
Since this inequality holds for arbitrary € > 0, we must have U(f +g) < U(f)+U (g). 


Similarly, we have L(f +g; P) > L(f; P) + L(g; P) for all partitions P, and for 
every € > 0, we get L(f +g) > L(f) + L(g) — €, so L(f +g) > L(f) + L(y). 


For integrable functions f and g, it follows that 


U(f +g) < UCF) +U(g) = L(f) + L(g) < Lf +9). 


Since U(f +g) > L(f +g), we have U(f +g) = L(f +g) and f + g is integrable. 
Moreover, there is equality throughout the previous inequality, which proves the 
result. 


Although the integral is linear, the upper and lower integrals of non-integrable 
functions are not, in general, linear. 


Example 11.34. Define f,g: [0,1] —> R by 
_ fi ifee0,1nQ, _ fo ifee0,1]NQ, 
ozh if x € [0,1] \ Q, TEI if x € [0,1] \ Q. 
That is, f is the Dirichlet function and g = 1 — f. Then 
U(f)=U(9)=1, Lf)=Ly)=0, Ulf+g9)=L(f +g) =1, 


SO 
U(f+g9)<U(f)+U(g),  L(f +g) > L(f) + L(y). 


The product of integrable functions is also integrable, as is the quotient pro- 
vided it remains bounded. Unlike the integral of the sum, however, there is no way 
to express the integral of the product f fg in terms of f f and fg. 


Theorem 11.35. If f,g : [a,b] > R are integrable, then fg : [a,b] > R is in- 
tegrable. If, in addition, g # 0 and 1/g is bounded, then f/g : [a,b] > R is 
integrable. 


11.5. Linearity, monotonicity, and additivity 225 


Proof. First, we show that the square of an integrable function is integrable. If f 
is integrable, then f is bounded, with |f| < M for some M > 0. For all x,y € [a,b], 
we have 


PE- PULSIE + F@)I- Fe) = U) S< 2M|F (x) -= FU). 
Taking the supremum of this inequality over x,y € I C [a,b] and using Proposi- 
tion|11.8| we get that 


sup(f”) — inf(f?) < 2M sup f —inf r| ; 
I I I I 
meaning that 
osc( f?) < 2M osc f. 
If follows from Proposition |11.25|that f? is integrable if f is integrable. 
Since the integral is linear, we then see from the identity 


fg= ! (+9 -(f-9)’] 


that fg is integrable if f, g are integrable. We remark that the trick of represent- 
ing a product as a difference of squares isn’t a new one: the ancient Babylonian 
apparently used this identity, together with a table of squares, to compute products. 


In a similar way, if g 4 0 and |1/g| < M, then 


: 1 | _ 9@=-9OI 2 Mal) 
g(a) al Maal <M Sah 


Taking the supremum of this equation over x,y € I C [a,b], we get 


1 1 
sup (=) — inf G) < M? ws- int 3 
I g T \g I I 


meaning that osc;(1/g) < M? osc; g, and Proposition |11.25] implies that 1/g is 
integrable if g is integrable. Therefore f/g = f - (1/g) is integrable. 


11.5.2. Monotonicity. Next, we prove the monotonicity of the integral. 


Theorem 11.36. Suppose that f,g : [a,b] > R are integrable and f < g. Then 


[ss foo 


Proof. First suppose that f > 0 is integrable. Let P be the partition consisting of 
the single interval [a,b]. Then 


L(f; P) = sxi 2 0, 
so 
b 
f fz unPze 
If f > g, then h = f — g > 0, and the linearity of the integral implies that 


[t-f a-f rza 


which proves the theorem. 


226 11. The Riemann Integral 


One immediate consequence of this theorem is the following simple, but useful, 
estimate for integrals. 


Theorem 11.37. Suppose that f : [a,b] > R is integrable and 


M=supf, m = inf f. 
[a,b] [a,b] 


Then 
b 
m(b — a) a f<M(b-a). 


Proof. Since m < f < M on [a,b], Theorem[11.36|implies that 


b b b 
Jae iea 


This estimate also follows from the definition of the integral in terms of upper 
and lower sums, but once we’ve established the monotonicity of the integral, we 
don’t need to go back to the definition. 


which gives the result. 


A further consequence is the intermediate value theorem for integrals, which 
states that a continuous function on a compact interval is equal to its average value 
at some point in the interval. 


Theorem 11.38. If f : [a,b] > R is continuous, then there exists x € [a,b] such 


that 
1 b 
fay=5— ft 


Proof. Since f is a continuous function on a compact interval, the extreme value 
theorem (Theorem|7.37) implies it attains its maximum value M and its minimum 


value m. From Theorem (11.37 
1 b 
m < / f <M. 


~ b-a 


By the intermediate value theorem (Theorem|7.44), f takes on every value between 
m and M, and the result follows. 


As shown in the proof of Theorem|11.36] given linearity, monotonicity is equiv- 
alent to positivity, 


b 
f t>o if f > 0. 


We remark that even though the upper and lower integrals aren’t linear, they are 
monotone. 


Proposition 11.39. If f,g : [a,b] > R are bounded functions and f < g, then 
U(f)<U(g),  L(F) < L(g). 


11.5. Linearity, monotonicity, and additivity 227 


Proof. From Proposition [11.1] we have for every interval I C [a,b] that 
Pup < sup 9; inf f < inf g. 
It follows that for every partition P of [a,b], we have 
U(f; P) < U(g; P),  L(f; P) < L(g; P). 


Taking the infimum of the upper inequality and the supremum of the lower inequal- 
ity over P, we get that U(f) < U(g) and L(f) < L(g). 


We can estimate the absolute value of an integral by taking the absolute value 
under the integral sign. This is analogous to the corresponding property of sums: 


n n 
San] < $5 lasl 
k=1 k=1 


Theorem 11.40. If f is integrable, then |f| is integrable and 


[als fou 


Proof. First, suppose that |f| is integrable. Since 
-IfI < f < IF|, 
we get from Theorem]11.36] that 


- fais firs fin or Sefa 


To complete the proof, we need to show that |f| is integrable if f is integrable. 
For x,y € [a,b], the reverse triangle inequality gives 


EŒ II S IEE) = F). 
Using Proposition [11.8] we get that 


sup |f| —inf|f| < sup f — inf f, 
I I I Í 


meaning that osc; |f| < osc; f. Proposition|11.25|then implies that |f| is integrable 
if f is integrable. 


In particular, we immediately get the following basic estimate for an integral. 


Corollary 11.41. If f : [a,b] > R is integrable and M = sup,, yj |f|; then 
b 
Ji 


Finally, we prove a useful positivity result for the integral of continuous func- 
tions. 


<M(b-a). 


Proposition 11.42. If f : [a,b] > R is a continuous function such that f > 0 and 
J? f =0, then f =0. 


228 11. The Riemann Integral 


Proof. Suppose for contradiction that f(c) > 0 for some a < c < b. For definite- 
ness, assume that a < c < b. (The proof is similar if c is an endpoint.) Then, since 
f is continuous, there exists 6 > 0 such that 


le) - fol < 2 fore—d<a<c+t+0, 


where we choose 6 small enough that c— ô > a and c +ô < b. It follows that 


fe) = (+ f(a) - FO = A-le- FOIE Ho 


forc—6<a<c+6. Using this inequality and the assumption that f > 0, we get 


[e-f fof re 0+ 2 as 050. 


This contradiction proves the result. 


The assumption that f > 0 is, of course, required, otherwise the integral of the 
function may be zero due to cancelation. 
Example 11.43. The function f : [—1,1] > R defined by f(a) = x is continuous 
and nonzero, but Si f=0. 


Continuity is also required; for example, the discontinuous function in Exam- 
ple|11.14ļ|is nonzero, but its integral is zero. 


11.5.3. Additivity. Finally, we prove additivity. This property refers to addi- 
tivity with respect to the interval of integration, rather than linearity with respect 
to the function being integrated. 


Theorem 11.44. Suppose that f : [a,b] ~ R and a < c < b. Then f is Rie- 
mann integrable on [a,b] if and only if it is Riemann integrable on [a,c] and [c, b]. 


Moreover, in that case, 
b è b 
Liht dt 
a a Cc 


Proof. Suppose that f is integrable on [a,b]. Then, given e > 0, there is a partition 
P of [a,b] such that U(f; P) — L(f; P) < e. Let P’ = PU {c} be the refinement 
of P obtained by adding c to the endpoints of P. (If c € P, then P’ = P.) Then 
P' = QUR where Q = P'AN [a,c] and R = P’ A fc, b] are partitions of [a,c] and [c, b] 
respectively. Moreover, 

U(f; P')=U(f;Q)+U(f; R), LF; P') = L(f;Q) + LF; R). 
It follows that 


U(f; Q) — L(f;Q) =U(f; P") — L(f; P') — [U (f; R) - L(f; R)] 
< U(f; P) — L(f; P) < €, 


which proves that f is integrable on [a,c]. Exchanging Q and R, we get the proof 
for [c, b]. 


11.5. Linearity, monotonicity, and additivity 229 


Conversely, if f is integrable on [a,c] and [c, b], then there are partitions Q of 
[a,c] and R of [c, b] such that 


U(fiQ)-LQ <5, UFR) - LAR) <5. 


Let P=QUR. Then 
U(f; P) — L(f; P) = U(f;Q) — L(f;Q) + UF; R) — L(f; R) < e, 
which proves that f is integrable on [a,b]. 
Finally, if f is integrable, then with the partitions P, Q, R as above, we have 
b 
[fs UP) =U6Q) UGR) 
< L(f;Q)+ L(f;R) +e 


<f ttf tre 


Similarly, 


b 
i f > L(F; P) =L(f;Q) + L(f; R) 
>U(f;Q)+U(f;R)-€ 


af eTa 


Since € > 0 is arbitrary, we see that f? f = SE f+ f? f. 


We can extend the additivity property of the integral by defining an oriented 
Riemann integral. 


Definition 11.45. If f : [a,b] > R is integrable, where a < b, and a < c < b, then 


[rfe [r 


With this definition, the additivity property in Theorem |11.44| holds for all 
a,b,c € R for which the oriented integrals exist. Moreover, if |f| < M, then the 
estimate in Corollary |11.41| becomes 


b 
Ji 
for all a,b € R (even if a > b). 


The oriented Riemann integral is a special case of the integral of a differential 
form. It assigns a value to the integral of a one-form f dz on an oriented interval. 


< M\|b-al 


230 11. The Riemann Integral 


11.6. Further existence results 


In this section, we prove several further useful conditions for the existences of the 
Riemann integral. 


First, we show that changing the values of a function at finitely many points 
doesn’t change its integrability of the value of its integral. 


Proposition 11.46. Suppose that f,g : [a,b] > R and f(x) = g(x) except at 
finitely many points x € [a,b]. Then f is integrable if and only if g is integrable, 


and in that case a 5 
LLS 
a a 


Proof. It is sufficient to prove the result for functions whose values differ at a 
single point, say c € [a,b]. The general result then follows by repeated application 
of this result. 

Since f, g differ at a single point, f is bounded if and only if g is bounded. If f, 
g are unbounded, then neither one is integrable. If f, g are bounded, we will show 
that f, g have the same upper and lower integrals. The reason is that their upper 
and lower sums differ by an arbitrarily small amount with respect to a partition 
that is sufficiently refined near the point where the functions differ. 

Suppose that f, g are bounded with |f], |g| < M on [a,b] for some M > 0. Let 
€ > 0. Choose a partition P of [a,b] such that 


€ 
U(f; P) < UCS) + $. 
Let Q = {h,...,In} be a refinement of P such that |I| < 6 for k = 1,...,n, where 
€ 
ô = —. 
8M 
Then g differs from f on at most two intervals in Q. (This could happen on two 
intervals if c is an endpoint of the partition.) On such an interval Jẹ we have 


sup g — sup < sup |g| + sup |f| < 2M, 
Ik Ik Ik Ik 
and on the remaining intervals, sup;, g — sup;, f = 0. It follows that 
€ 

UG; @) - UF: Q)| < 2M -28 < Ś. 

Using the properties of upper integrals and refinements, we obtain that 
€ € 
U(g) SU(G:Q) <UL Q) +5 SUP) +5 <Ulf) +e 


Since this inequality holds for arbitrary € > 0, we get that U(g) < U(f). Exchang- 
ing f and g, we see similarly that U(f) < U(g), so U(f) = U(g). 

An analogous argument for lower sums (or an application of the result for 
upper sums to — f, —g) shows that L(f) = L(g). Thus U(f) = L(f) if and only if 
U(g) = L(g), in which case fr = ie g. 
Example 11.47. The function f in Example [11.14] differs from the 0-function at 
one point. It is integrable and its integral is equal to 0. 


11.6. Further existence results 231 


The conclusion of Proposition|11.46]can fail if the functions differ at a countably 
infinite number of points. One reason is that we can turn a bounded function into 
an unbounded function by changing its values at an countably infinite number of 
points. 


Example 11.48. Define f : [0,1] > R by 


n if2=1/n forn €N, 
f(a) = u 
0 otherwise. 


Then f is equal to the 0-function except on the countably infinite set {1/n : n € N}, 
but f is unbounded and therefore it’s not Riemann integrable. 


The result in Proposition |11.46] is still false, however, for bounded functions 
that differ at a countably infinite number of points. 


Example 11.49. The Dirichlet function in Example|11.15/is bounded and differs 
from the 0-function on the countably infinite set of rationals, but it isn’t Riemann 
integrable. 


The Lebesgue integral is better behaved than the Riemann intgeral in this 
respect: two functions that are equal almost everywhere, meaning that they differ 
on a set of Lebesgue measure zero, have the same Lebesgue integrals. In particular, 
two functions that differ on a countable set have the same Lebesgue integrals (see 
Section i 

The next proposition allows us to deduce the integrability of a bounded function 
on an interval from its integrability on slightly smaller intervals. 


Proposition 11.50. Suppose that f : [a,b] — R is bounded and integrable on 
[a,r] for every a < r < b. Then f is integrable on fa, b] and 


b r 
f f= lim f. 
a r>b- Ja 
Proof. Since f is bounded, |f| < M on [a,b] for some M > 0. Given e > 0, let 
€ 

4M 
(where we assume € is sufficiently small that r > a). Since f is integrable on [a,r], 
there is a partition Q of [a,r] such that 


U(f:Q) — LIQ) < 5. 


Then P = QU{b} is a partition of [a, b] whose last interval is [r, b]. The boundedness 
of f implies that 


r=b— 


sup f — inf f < 2M. 
[rb] [rb] 


Therefore 


U(f; P) - L(f; P) = U(f;Q) — Lf: Q) + (sup f — inf f) (6-7) 


[r,b] [r,b] 


<$ +2M:(b-r)=6, 


232 11. The Riemann Integral 


so f is integrable on |a, b] by Theorem|11.23} Moreover, using the additivity of the 
integral, we get 
b 
di 


fr: 


An obvious analogous result holds for the left endpoint. 


<M-(b-r)7>0 as r > b7. 


Example 11.51. Define f : [0,1] > R by 


_ Jsin(1/x) Poe, 
a if x =0. 


Then f is bounded on [0,1]. Furthemore, f is continuous and therefore integrable 
on [r,1] for every 0 < r < 1. It follows from Proposition|11.50}that f is integrable 
on [0, 1]. 


The assumption in Proposition|11.50/that f is bounded on [a,b] is essential. 
Example 11.52. The function f : [0,1] — R defined by 


1/x for0<a<1, 
F(z) = f for x = 0, 


is continuous and therefore integrable on [r,1] for every 0 < r < 1, but it’s un- 
bounded and therefore not integrable on [0, 1]. 


As a corollary of this result and the additivity of the integral, we prove a 
generalization of the integrability of continuous functions to piecewise continuous 
functions. 


Theorem 11.53. If f : [a,b] > R is a bounded function with finitely many dis- 
continuities, then f is Riemann integrable. 


Proof. By splitting the interval into subintervals with the discontinuities of f at 
an endpoint and using Theorem|11.44] we see that it is sufficient to prove the result 
if f is discontinuous only at one endpoint of [a,b], say at b. In that case, f is 
continuous and therefore integrable on any smaller interval [a,r] with a < r < b, 
and Proposition [11.50] implies that f is integrable on [a,b]. 


Example 11.54. Define f : [0,27] > R by 
sin(1/sinz) if x 40,7, 27, 
f(x) = vee 
0 if x = 0,7, 27. 


Then f is bounded and continuous except at x = 0,7, 27, so it is integrable on [0, 27] 
(see Figure[3). This function doesn’t have jump discontinuities, but Theorem|11.53 
still applies. 


11.6. Further existence results 233 


Figure 3. Graph of the Riemann integrable function y = sin(1/ sina) in Example|11.54 


0 0.05 0.1 0.15 0.2 0.25 0.3 


Figure 4. Graph of the Riemann integrable function y = sgn(sin(1/x)) in Example|11.55 


Example 11.55. Define f : [0,1/7] > R by 
joja sgn [sin (1/x)] if x # 1/nr for n €N, 
0 if x = 0 or x Æ 1/nr for n € N, 
where sgn is the sign function, 
1 if x > 0, 
senz = 4 0 if x = 0, 
-1 ifxr<0. 


234 11. The Riemann Integral 


Then f oscillates between 1 and —1 a countably infinite number of times as x > 
0+ (see Figure f4). It has jump discontinuities at x = 1/(nm) and an essential 
discontinuity at x = 0. Nevertheless, it is Riemann integrable. To see this, note that 
f is bounded on [0,1] and piecewise continuous with finitely many discontinuities 
on [r,1] for every 0< r <1. Theorem [11.53] implies that f is Riemann integrable 
on [r, 1], and then Theorem implies that f is integrable on {0, 1]. 


11.7. * Riemann sums 


Instead of using upper and lower sums, we can give an equivalent definition of the 
Riemann integral as a limit of Riemann sums. This was, in fact, Riemann’s original 
definition [II], which he gave in 1854 in his Habilitationsschrift (a kind of post- 
doctoral dissertation required of German academics), building on previous work of 
Cauchy who defined the integral for continuous functions. 


It is interesting to note that the topic of Riemann’s Habilitationsschrift was 
not integration theory, but Fourier series. Riemann introduced a definition of the 
integral along the way so that he could state his results more precisely. Many of 
the fundamental developments of rigorous real analysis in the nineteenth century 
were motivated by problems related to Fourier series and their convergence. 


Upper and lower sums were introduced subsequently by Darboux, and they 
simplify the theory. We won’t use Riemann sums here, but we will explain the 
equivalence of the definitions. We’ll say, temporarily, that a function is Darboux 
integrable if it satisfies Definition [11.1] 


To give Riemann’s definition, we first define a tagged partition (P,C) of a 
compact interval [a,b] to be a partition 
P={h,lo,...,In} 
of the interval together with a set 
C= {c1,€2,---,¢n} 
of points such that ck € I, for k = 1,...,n. (We think of the point cp as a “tag” 
attached to the interval Ip.) 


If f : [a,b] — R, then we define the Riemann sum of f with respect to the 
tagged partition (P, C) by 


S(f; P, C) = >> flee )lIal- 
k=1 


That is, instead of using the supremum or infimum of f on the kth interval in the 
sum, we evaluate f at a point in the interval. Roughly speaking, a function is 
Riemann integrable if its Riemann sums approach the same value as the partition 
is refined, independently of how we choose the points cp € Ip. 

As a measure of the refinement of a partition P = {Ih, Io,...,In}, we define 
the mesh (or norm) of P to be the maximum length of its intervals, 


mesh(P) = max |I| = max |x, — £k-1l. 
1<k<n 1<k<n 


11.7. * Riemann sums 235 


Definition 11.56. A function f : [a,b] > R is Riemann integrable on [a,b] if there 
exists a number R € R with the following property: For every e > 0 there is a ô > 0 
such that 

|S(f;P,C)— R| <€ 
for every tagged partition (P, C) of [a,b] with mesh(P) < 6. In that case, R = f? f 
is the Riemann integral of f on [a,b]. 


Note that 
L(f; P) < S(f; P,C) < U(f; P), 
so the Riemann sums are “squeezed” between the upper and lower sums. The 
following theorem shows that the Darboux and Riemann definitions lead to the 
same notion of the integral, so it’s a matter of convenience which definition we 
adopt as our starting point. 


Theorem 11.57. A function is Riemann integrable (in the sense of Definition|11.56) 
if and only if it is Darboux integrable (in the sense of Definition |11.11). Further- 
more, in that case, the Riemann and Darboux integrals of the function are equal. 


Proof. First, suppose that f : [a,b] > R is Riemann integrable with integral R. 
Then f is bounded on fa, b]; otherwise, it would be unbounded in some interval Iy 
of every partition P, and its Riemann sums with respect to P would be arbitrarily 
large for suitable points ck € Tp, so no R € R could satisfy Definition [11.56] 


Let € > 0. Since f is Riemann integrable, there is a partition P = {I1, I2,..., In} 
of [a,b] such that 
IS(f;P,C)- R| < $ 
for every set of points C = {cp € Ip : k =1,...,n}. If Mk = supr, f, then there 


exists ck € J, such that 
€ 


= 2(b—a) < Fle). 


It follows that 
€ 
XO Mill — 3$ XO (cx) Tel, 
k=1 k=1 
meaning that U(f; P) — «/2 < S(f; P,C). Since S(f; P,C) < R+ €/2, we get that 
U(f) <U(f;P)<Rte. 
Similarly, if mẹ = infz, f, then there exists c, € I, such that 
€ 
2(b— a) 
and L(f;P) + 6€/2 > S(f;P,C). Since S(f;P,C) > R—€/2, we get that 
L(f) > Lf; P) > R-€ 
These inequalities imply that 
L(f)+e>R>U(f)-« 


for every € > 0, and therefore L(f) > R > U(f). Since L(f) < U(f), we conclude 
that L(f) = R = U(f), so f is Darboux integrable with integral R. 


Mk + 


> fla) Somali + $ > D0 Aal, 
k=1 k=1 


236 11. The Riemann Integral 


Conversely, suppose that f is Darboux integrable. The main point is to show 
that if € > 0, then U(f; P) — L(f; P) < € not just for some partition but for every 
partition whose mesh is sufficiently small. 


Let e > 0 be given. Since f is Darboux integrable. there exists a partition Q 
such that 
€ 
Uf: Q) — LQ) < É. 
Suppose that Q contains m intervals and |f| < M on [a,b]. We claim that if 
€ 
8mM”’ 
then U(f; P) — L(f; P) < e for every partition P with mesh(P) < ô. 

To prove this claim, suppose that P = {l1,Io,...,I,} is a partition with 
mesh(P) < 6. Let P’ be the largest common refinement of P and Q, so that 
the endpoints of P’ consist of the endpoints of P or Q. Since a, b are common 
endpoints of P and Q, there are at most m — 1 endpoints of Q that are distinct 
from endpoints of P. Therefore, at most m — 1 intervals in P contain additional 


endpoints of Q and are strictly refined in P’, meaning that they are the union of 
two or more intervals in P’. 


Now consider U(f;P) — U(f;P’). The terms that correspond to the same, 
unrefined intervals in P and P’ cancel. If J; is a strictly refined interval in P, then 
the corresponding terms in each of the sums U (f; P) and U(f; P’) can be estimated 
by M|J,,| and their difference by 2M|JI;,|. There are at most m — 1 such intervals 
and |I;| < 6, so it follows that 


ô = 


U(f; P) — U(f; P’) < 2(m—1)Mô < : 
Since P’ is a refinement of Q, we get 


Aaea a e Q) + 5 < LQ) + 5. 


It follows by a similar argument that 


RP= P) < 5, 
and 
L(f;P) > L(f:P')— 2 > LF:Q) — 5 > UK) - 5. 

Since L(f;Q) < U(f; Q), we conclude from these inequalities that 
U(f;P)—L(f;P) <e 

for every partition P with mesh(P) < 6. 

If D denotes the Darboux integral of f, then we have 
L(f;P) < D <U(SF, P), LP) < S(f; P,C) < U(f; P). 

Since U(f; P)— L(f; P) < e for every partition P with mesh(P) < ô, it follows that 

IS(f;P,C) — D| < €. 


Thus, f is Riemann integrable with Riemann integral D. 


11.7. * Riemann sums 237 


Finally, we give a necessary and sufficient condition for Riemann integrability 
that was proved by Riemann himself (1854). (See [5] for further discussion.) To 
state the condition, we introduce some notation. 


Let f; [a,b] + R be a bounded function. If P = {h, l,...,In} is a partition 
of [a,b] and e > 0, let A.(P) C {1,...,n} be the set of indices k such that 


osc f = sup f — inf f > e for k € Ae(P). 
Ik Ir Ik 
Similarly, let B.(P) C {1,...,n} be the set of indices such that 
ose f <e for k € B.(P). 
k 


That is, the oscillation of f on Ip is “large” if k € A,(P) and “small” if k € B.(P). 
We denote the sum of the lengths of the intervals in P where the oscillation of f is 


“large” by 
s(P)= $, ll. 
kEA.(P) 
Fixing € > 0, we say that se(P) + 0 as mesh(P) — 0 if for every 7 > 0 there exists 
ð > 0 such that mesh(P) < 6 implies that se(P) < 7. 


Theorem 11.58. A function is Riemann integrable if and only if se(P) > 0 as 
mesh(P) — 0 for every e > 0. 


Proof. Let f : [a,b] + R be Riemann integrable with |f| < M on [a,b]. 


First, suppose that the condition holds, and let € > 0. If P is a partition of 
[a,b], then, using the notation above for A,(P), Be(P) and the inequality 


0 < osc f < 2M, 
Ik 


we get that 
U(f; P) — L(f; P) = doef [Za 


= 5 ose f - [In| + 5 ose f -|Z 


keA(P) * kEB.(P) 
<2M So |kl+e So [kl 
ke A-(P) keB.(P) 


<2Ms.(P) + e(b-— a). 


By assumption, there exists 6 > 0 such that se(P) < € if mesh(P) < 6, in which 
case 
U(f;P) — L(f;P) < (2M +b-a). 
The Cauchy criterion in Theorem|11.23}then implies that f is integrable. 
Conversely, suppose that f is integrable, and let € > 0 be given. If Pisa 
partition, we can bound s,(P) from above by the difference between the upper and 
lower sums as follows: 


UP- ose f : |Ik] > € SO h| = se(P).- 


kEA.(P) kEAe(P) 


238 11. The Riemann Integral 


Since f is integrable, for every 7 > 0 there exists 6 > 0 such that mesh(P) < 6 
implies that 
U(f;P) — L(f;P) < en. 
Therefore, mesh(P) < ô implies that 
1 
se(P) < [UF P) — L(f; P)] < n, 


which proves the result. 


This theorem has the drawback that the necessary and sufficient condition 
for Riemann integrability is somewhat complicated and, in general, isn’t easy to 
verify. In the next section, we state a simpler necessary and sufficient condition for 
Riemann integrability. 


11.8. * The Lebesgue criterion 


Although the Dirichlet function in Example |11.15|is not Riemann integrable, it is 
Lebesgue integrable. Its Lebesgue integral is given by 


1 
f t=1-141+0-18) 
0 


where A = [0,1] N Q is the set of rational numbers in [0,1], B = [0,1] \ Q is the 
set of irrational numbers, and |Æ| denotes the Lebesgue measure of a set Æ. The 
Lebesgue measure of a subset of R is a generalization of the length of an interval 
which applies to more general sets. It turns out that |A| = 0 (as is true for any 
countable set of real numbers — see Example below) and |B| = 1. Thus, the 
Lebesgue integral of the Dirichlet function is 0. 

A necessary and sufficient condition for Riemann integrability can be given in 
terms of Lebesgue measure. To state this condition, we first define what it means 
for a set to have Lebesgue measure zero. 


Definition 11.59. A set E C R has Lebesgue measure zero if for every e€ > 0 there 
is a countable collection of open intervals { (ax, bk) : k € N} such that 
co co 


EC LJ (ax, br), S (br — ar) <€. 


k=1 k=1 


The open intervals is this definition are not required to be disjoint, and they 
may “overlap.” 


Example 11.60. Every countable set E = {xp € R : k € N} has Lebesgue measure 
zero. To prove this, let e > 0 and for each k € N define 
€ € 


ak = Tk — Epp bk = Be + ED 


Then E C U2, (ak, bp) since £p € (ap, bg) and 


11.8. * The Lebesgue criterion 239 


so the Lebesgue measure of E is equal to zero. (The ‘e/2*’ trick used here is a 
common one in measure theory.) 


If E = [0,1] Q consists of the rational numbers in [0,1], then the set G = 
Ur, (ax, bk) described above encloses the dense set of rationals in a collection of 
open intervals the sum of whose lengths is arbitrarily small. This set isn’t so easy 
to visualize. Roughly speaking, if € is small and we look at a section of [0,1] at a 
given magnification, then we see a few of the longer intervals in G with relatively 
large gaps between them. Magnifying one of these gaps, we see a few more intervals 
with large gaps between them, magnifying those gaps, we see a few more intervals, 
and so on. Thus, the set G has a fractal structure, meaning that it looks similar at 
all scales of magnification. 


In general, we have the following result, due to Lebesgue, which we state with- 
out proof. 


Theorem 11.61. A function f : [a,b] > R is Riemann integrable if and only if it 
is bounded and the set of points at which it is discontinuous has Lebesgue measure 
Zero. 


For example, the set of discontinuities of the Riemann-integrable function in 
Example[11.14]consists of a single point {0}, which has Lebesgue measure zero. On 
the other hand, the set of discontinuities of the non-Riemann-integrable Dirichlet 
function in Example [11.15] is the entire interval [0, 1], and its set of discontinuities 
has Lebesgue measure one. 


In particular, every bounded function with a countable set of discontinuities is 
Riemann integrable, since such a set has Lebesgue measure zero. Riemann integra- 
bility of a function does not, however, imply that the function has only countably 
many discontinuities. 


Example 11.62. The Cantor set C in Example [5.64] has Lebesgue measure zero. 
To prove this, using the same notation as in Section we note that for every 
n € N the set F, D C consists of 2” closed intervals Is of length |Is| = 37”. For 
every € > 0 and s € Up, there is an open interval Us of slightly larger length 
|Us| = 37” + 27” that contains Is. Then {Us : s € Xn} is a cover of C by open 


intervals, and 
2 n 


sEUn 
We can make the right-hand side as small as we wish by choosing n large enough 
and e small enough, so C has Lebesgue measure zero. 


Let xc : [0,1] — R be the characteristic function of the Cantor set, 


(x) 1 ifxeC, 
x)= 
= 0 otherwise. 


By partitioning [0, 1] into the closed intervals {Us : s € £n} and the closures of the 
complementary intervals, we see similarly that the upper Riemann sums of xc can 
be made arbitrarily small, so yc is Riemann integrable on [0,1] with zero integral. 
The Riemann integrability of the function yc also follows from Theorem 


240 11. The Riemann Integral 


It is, however, discontinuous at every point of C. Thus, xc is an example of a 
Riemann integrable function with uncountably many discontinuities. 


—— 
Chapter 12 


Properties and Applications of 
the Integral 


In the integral calculus I find much less interesting the parts that involve 
only substitutions, transformations, and the like, in short, the parts that 
involve the known skillfully applied mechanics of reducing integrals to 
algebraic, logarithmic, and circular functions, than I find the careful and 
profound study of transcendental functions that cannot be reduced to 
these functions. (Gauss, 1808) 


12.1. The fundamental theorem of calculus 


The fundamental theorem of calculus states that differentiation and integration 
are inverse operations in an appropriately understood sense. The theorem has two 
parts: in one direction, it says roughly that the integral of the derivative is the 
original function; in the other direction, it says that the derivative of the integral 
is the original function. 

In more detail, the first part states that if F : [a,b] > R is differentiable with 
integrable derivative, then 


b 
f F'(x)dxz = F(b) — F(a). 


This result can be thought of as a continuous analog of the corresponding identity 
for sums of differences, 


n 


XO (Ar — Ag—1) = An — Ao. 


k=1 


The second part states that if f : [a,b] > R is continuous, then 


Ef roa (x). 


241 


242 12. Properties and Applications of the Integral 


This is a continuous analog of the corresponding identity for differences of sums, 
k k-1 
y aj — y aj = ak. 
j=l j=l 


The proof of the fundamental theorem consists essentially of applying the iden- 
tities for sums or differences to the appropriate Riemann sums or difference quo- 
tients and proving, under appropriate hypotheses, that they converge to the corre- 
sponding integrals or derivatives. 


We'll split the statement and proof of the fundamental theorem into two parts. 
(The numbering of the parts as I and II is arbitrary.) 


12.1.1. Fundamental theorem I. First we prove the statement about the in- 
tegral of a derivative. 


Theorem 12.1 (Fundamental theorem of calculus I). If F : [a,b] — R is continuous 
on [a, 6] and differentiable in (a,b) with F” = f where f : [a,b] > R is Riemann 
integrable, then 


b 
f f(a) dx = F(b) — F(a). 


Proof. Let 
P = {20,%1,%2,..-,;Un—1, En}, 
be a partition of [a,b], with zo = a and x, = b. Then 


n 


F(b) — F(a) = X [F (xx) — F(e-1)]. 


k=1 
The function F is continuous on the closed interval [x,_1, £] and differentiable in 
the open interval (2,1, £k) with F” = f. By the mean value theorem, there exists 
£k—1 < Ck < Lp such that 
F (ax) = F(£k—1) = f (ce) (xx = Dpi) 
Since f is Riemann integrable, it is bounded, and 
Mp(Tk — £k—1) < F(£k)— F(£k-1) < Mk(£k — £k-1), 


where 
Mk= sup f, mp = inf f. 


[rk-1;£k] [zk-1:2k] 
Hence, L(f; P) < F(b)—F (a) < U(f; P) for every partition P of [a, b], which implies 
that L(f) < F(b) — F(a) < U(f). Since f is integrable, L(f) = U(f) = f? f and 
therefore F(b) — F(a) = f? f. 


In Theorem [12.1] we assume that F is continuous on the closed interval |a, b 
and differentiable in the open interval (a,b) where its usual two-sided derivative 
is defined and is equal to f. It isn’t necessary to assume the existence of the 
right derivative of F at a or the left derivative at b, so the values of f at the 
endpoints are not necessarily determined by F. By Proposition however, 
the integrability of f on [a,b] and the value of its integral do not depend on these 
values, so the statement of the theorem makes sense. As a result, we’ll sometimes 


12.1. The fundamental theorem of calculus 243 


abuse terminology and say that “F” is integrable on [a, b]” even if it’s only defined 
on (a,b). 

Theorem|[12.1]imposes the integrability of F’ as a hypothesis. Every function F 
that is continuously differentiable on the closed interval [a, b] satisfies this condition, 
but the theorem remains true even if F” is a discontinuous, Riemann integrable 
function. 


Example 12.2. Define F : [0,1] > R by 


2 sin(1 if <1 
Pae x* sin(1/x) Piers ; 
0 ifa=0. 


Then F is continuous on [0,1] and, by the product and chain rules, differentiable 
in (0, 1]. It is also differentiable — but not continuously differentiable — at 0, with 
F'(0+) =0. Thus, 


F(a) = — cos (1/x) + 2x sin (1/x) if0<a<1, 
0 if x =0. 


The derivative F’ is bounded on [0, 1] and discontinuous only at one point (x = 0), 
so Theorem |11.53| implies that F” is integrable on [0,1]. This verifies all of the 
hypotheses in Theorem {12.1} and we conclude that 


1 
J F'(x)dz = sin1. 
0 


There are, however, differentiable functions whose derivatives are unbounded 
or so discontinuous that they aren’t Riemann integrable. 


Example 12.3. Define F : [0,1] > R by F(x) = yx. Then F is continuous on 
[0, 1] and differentiable in (0, 1], with 


F' (x) = —— for 0'<2< 1, 
x 


This function is unbounded, so F” is not Riemann integrable on [0,1], however we 
define its value at 0, and Theorem does not apply. 


We can interpret the integral of F” on [0, 1] as an improper Riemann integral (as 
is discussed further in Section 12.4). The function F is continuously differentiable 
on |e, 1] for every 0 < e€ < 1, so 


1 
1 
—~— dr = 1 — vye. 
f 2/x 
Thus, we get the improper integral 
li ae d. 1 
im —~ dr =1. 
e>0t Je 2/x 
The construction of a function with a bounded, non-integrable derivative is 
more involved. It’s not sufficient to give a function with a bounded derivative that 
is discontinuous at finitely many points, as in Example because such a function 
is Riemann integrable. Rather, one has to construct a differentiable function whose 


244 12. Properties and Applications of the Integral 


derivative is discontinuous on a set of nonzero Lebesgue measure. Abbott [1] gives 
an example. 

Finally, we remark that Theorem remains valid for the oriented Riemann 
integral, since exchanging a and b reverses the sign of both sides. 


12.1.2. Fundamental theorem of calculus II. Next, we prove the other direc- 
tion of the fundamental theorem. We will use the following result, of independent 
interest, which states that the average of a continuous function on an interval ap- 
proaches the value of the function as the length of the interval shrinks to zero. The 
proof uses a common trick of taking a constant inside an average. 


Theorem 12.4. Suppose that f : [a,b] > R is integrable on [a,b] and continuous 
at a. Then 


(That is, the average of a constant is equal to the constant.) We can therefore write 


ath ath 
rf f@ae-s@= PE-O) ae. 


Let e > 0. Since f is continuous at a, there exists ô > 0 such that 
|f(z) — f(a)|<e for a<a<a+to. 
It follows that if 0 < h < ô, then 


1 


> sup |f@)- Fla) A< e 


a<a<a+h 


ath 
/: f(a) de — f(a) 


which proves the result. 


A similar proof shows that if f is continuous at b, then 
b 


nes h b—h 
and if f is continuous at a < c < b, then 
1 cth 
lim — = : 
že aJ, F240 


More generally, if f is continuous at c and {Ip : h > 0} is any collection of intervals 
with c € J, and |n| + 0 as h > O°, then 


1 
lim — = f(c). 
h—0+ |In] L f F ) 
The assumption in Theorem that f is continuous at the point about which we 
take the averages is essential. 


12.1. The fundamental theorem of calculus 245 


Example 12.5. Let f : R — R be the sign function 


1 if x > 0, 
f(z) = 40 if z = 0, 
—1 ifa<0O. 


Then 


a 1 f° 
lim — =1 lim — EE 
din gj Ods lim, gj fOd 


and neither limit is equal to f(0). In this example, the limit of the symmetric 


averages 
Fats 2h 39 P(e 


is equal to f(0), but this equality doesn’t hold if we change f(0) to a nonzero value 
(for example, if f(x) = 1 for x > 0 and f(x) = —1 for x < 0) since the limit of the 
symmetric averages is still 0. 


The second part of the fundamental theorem follows from this result and the 
fact that the difference quotients of F are averages of f. 


Theorem 12.6 (Fundamental theorem of calculus II). Suppose that f : [a,b] —> R 
is integrable and F : [a,b] — R is defined by 


a= | seat 


b]. Moreover, if f is continuous at a < c < b, then F is 


(o). 


Proof. First, note that Theorem |11.44| implies that f is integrable on [a,x] for 
every a < x < b, so F is well-defined. Since f is Riemann integrable, it is bounded, 
and |f| < M for some M > 0. It follows that 


see 


which shows that F is continuous on [a,b] (in fact, Lipschitz continuous). 


Then F is continuous on [a, 
differentiable at c and F’(c) 


|F(x +h) — F(a)| = < Mļhl, 


Moreover, we have 


F(cth)-F(c)_1 [10 z 


h h 
It follows from Theorem that if f is continuous at c, then F is differentiable 
at c with 
man ee Eeth- 
FoS h Fe) mi fH f(e), 


where we use the appropriate right or left limit at an endpoint. 


The assumption that f is continuous is needed to ensure that F is differentiable. 


246 12. Properties and Applications of the Integral 


Example 12.7. If 
1 for «> 0, 
f(x) = 


0 forx <0, 
then 


F(a) = [sat = t for x > 0, 


0 forz <0. 


The function F is continuous but not differentiable at x = 0, where f is discon- 
tinuous, since the left and right derivatives of F at 0, given by F’(0~) = 0 and 
F’(0*) = 1, are different. 


12.2. Consequences of the fundamental theorem 


The first part of the fundamental theorem, Theorem is the basic tool for 
the exact evaluation of integrals. It allows us to compute the integral of of a 
function f if we can find an antiderivative; that is, a function F such that F” = 
f. There is no systematic procedure for finding antiderivatives. Moreover, an 
antiderivative of an elementary function (constructed from power, trigonometric, 
and exponential functions and their inverses) need not be — and often isn’t — 
expressible in terms of elementary functions. By contrast, the rules of differentiation 
provide a mechanical algorithm for the computation of the derivative of any function 
formed from elementary functions by algebraic operations and compositions. 


Example 12.8. For p = 0,1,2,..., we have 
a l pr = r? 
dx |p+1 i 
1 
1 
| x? dx = ——. 
0 p+ 1 


Example 12.9. We can use the fundamental theorem to evaluate certain limits of 
sums. For example, 


and it follows that 


since the sum on the left-hand side is the upper sum of x? on a partition of [0,1] 
into n intervals of equal length. Example}11.27| illustrates this result explicitly for 
p=2. 


Two important general consequences of the first part of the fundamental theo- 
rem are integration by parts and substitution (or change of variable), which come 
from inverting the product rule and chain rule for derivatives, respectively. 


Theorem 12.10 (Integration by parts). Suppose that f,g : [a,b] + R are contin- 
uous on [a,b] and differentiable in (a,b), and f’, g' are integrable on [a,b]. Then 


b b 
f fa! dæ = f(g) — F(a)g(a) — f f'gda. 


12.2. Consequences of the fundamental theorem 247 


Proof. The function fg is continuous on [a,b] and, by the product rule, differen- 
tiable in (a,b) with derivative 
(f9) = fg + f'g. 
Since f, g, f’, g' are integrable on [a,b], Theorem{|11.35|implies that fg’, f'g, and 
(fg)’, are integrable. From Theorem we get that 
b b b 
J foaes f fode= f (Fo) de = FOO) - Foo, 


which proves the result. 


Integration by parts says that we can move a derivative from one factor in 
an integral onto the other factor, with a change of sign and the appearance of 
a boundary term. The product rule for derivatives expresses the derivative of a 
product in terms of the derivatives of the factors. By contrast, integration by parts 
doesn’t give an explicit expression for the integral of a product, it simply replaces 
one integral by another. This can sometimes be used transform an integral into an 
integral that is easier to evaluate, but the importance of integration by parts goes 
far beyond its use as an integration technique. 


Example 12.11. For n = 0,1,2,3,..., let 
In(£) = | fe “dt. 
0 
If n > 1, then integration by parts with f(t) = t” and g/(t) = e™* gives 


T 
In(£) = x” e7" + n f le dt = -xe + nIn-i(z). 
0 


Also, by the fundamental theorem of calculus, 


Ip(x) -f e*dt=1-—e™. 
0 


It then follows by induction that 


k 


Since z"e~* — 0 as © > oc for every k = 0,1,2,..., we get the improper 


integral 
co r 
| te ‘dt = lim Me dian 
0 roo Jo 
This formula suggests an extension of the factorial function to complex numbers 
z € C, called the Gamma function, which is defined for Rz > 0 by the improper, 
complex-valued integral 


re)= f tle dt. 
0 


In particular, T(n) = (n — 1)! for n € N. The Gamma function is an important 
special function, which is studied further in complex analysis. 


Next we consider the change of variable formula for integrals. 


248 12. Properties and Applications of the Integral 


Theorem 12.12 (Change of variable). Suppose that g : I + R differentiable on 
an open interval J and g’ is integrable on J. Let J = g(Z). If f : J > R continuous, 
then for every a,b € J, 


d 9(b) 
f Eoad f Hadu. 


Proof. For x € J, let 


Since f is continuous, Theorem implies that F is differentiable in J with 
F' = f. The chain rule implies that the composition F o g : I > R is differentiable 
in J, with 


(F o g) (2) = f (g(x) 9'(2). 


This derivative is integrable on [a,b] since f o g is continuous and g’ is integrable. 
Theorem the definition of F, and the additivity of the integral then imply 
that 


b b 
f F (9(2)) g(a) dx = i, (Fog) de 
= F (g(b)) — F (g(a)) 


g(b) 
= L F'(u) du, 
g(a) 


which proves the result. 


There is no assumption in this theorem that g is invertible, but we often use the 
theorem in that case. A continuous function maps an interval to an interval, and it 
is one-to-one if and only if it is strictly monotone. An increasing function preserves 
the orientation of the interval, while a decreasing function reverses it, in which case 
the integrals in the previous theorem are understood as oriented integrals. 

This result can also be formulated in terms of non-oriented integrals. Suppose 
that g : I > J is one-to-one and onto from an interval J = [a,b] to an interval 
J = g(1) = [c,d] where c = g(a), d = g(b) if g is increasing, and c = g(b), d = g(a) 
if g is decreasing, then 


[ft i [EDO 


In this identity, both integrals are over positively oriented intervals and we include 
an absolute value in the Jacobian factor |g’(x)|. If g’ > 0, then this identity is the 


12.2. Consequences of the fundamental theorem 249 


same as the oriented form, while if g’ < 0, then 


b 
i (fo9)(e)lg/(@)| dx = f (fo9)(e)[-9'(a)] dz 
I a 


g(a) 
=f fwau 
g(b) 


7 L EN 


Example 12.13. For every a > 0, the increasing, differentiable function g : R —> R 
defined by g(x) = xz? maps (—a,a) one-to-one and onto (—a?, a) and preserves 
orientation. Thus, if f : [—a,a] > R is continuous, 
a a? 
f(z?) 32? dz = f(u) du. 
=ü —a3 
The decreasing, differentiable function g : R — R defined by g(x) = —x? maps 
(—a,a) one-to-one and onto (—a?, a?) and reverses orientation. Thus, 
a —a? a? 
f(—23) - (—3z°) dz = / f(u) du = — f(u) du. 
—a a3 —a3 
The non-monotone, differentiable function g : R —> R defined by g(x) = x? maps 
(—a,a) onto [0,a?). It is two-to-one, except at x = 0. The change of variables 


formula gives 


. fla?) 2d = f f(u) du = 0. 


The contributions to the original integral from [0,a] and [—a,0] cancel since the 
integrand is an odd function of x. 


One consequence of the second part of the fundamental theorem, Theorem[12.6] 
is that every continuous function has an antiderivative, even if it can’t be expressed 
explicitly in terms of elementary functions. This provides a way to define transcen- 
dental functions as integrals of elementary functions. 


Example 12.14. One way to define the natural logarithm log : (0,00) — R in 
terms of algebraic functions is as the integral 


“1 
logz = f — dt. 
1 ¢ 


This integral is well-defined for every 0 < x < oo since 1/t is continuous on the 
interval [1,2] if x > 1, or [x,1] if 0 <a < 1. The usual properties of the logarithm 
follow from this representation. We have (logx)’ = 1/a by definition; and, for 
example, by making the substitution s = xt in the second integral in the following 
equation, when dt/t = ds/s, we get 


71 “i “1 HT Se 
logz+logy= f pats f za= f TEJI sas = f = dt = log(zy). 
1 1 1 r 8 1 t 


250 12. Properties and Applications of the Integral 


Figure 1. Graphs of the error function y = F(x) (blue) and its derivative, 
the Gaussian function y = f(x) (green), from Example}12.15 


We can also define many non-elementary functions as integrals. 


Example 12.15. The error function 


J T 
erf(x) = T! e” dt 


is an anti-derivative on R of the Gaussian function 


The error function isn’t expressible in terms of elementary functions. Nevertheless, 
it is defined as a limit of Riemann sums for the integral. Figure [I|shows the graphs 
of f and F. The name “error function” comes from the fact that the probability of 
a Gaussian random variable deviating by more than a given amount from its mean 
can be expressed in terms of F. Error functions also arise in other applications; for 
example, in modeling diffusion processes such as heat flow. 


Example 12.16. The Fresnel sine function S' is defined by 


ioa a én (=) dt. 


The function S is an antiderivative of sin(wt?/2) on R (see Figure [2), but it can’t 
be expressed in terms of elementary functions. Fresnel integrals arise, among other 
places, in analysing the diffraction of waves, such as light waves. From the perspec- 
tive of complex analysis, they are closely related to the error function through the 
Euler formula et? = cos @ + isin 0. 


Discontinuous functions may or may not have an antiderivative and typically 
don’t. Darboux proved that every function f : (a,b) — R that is the derivative of 
a function F : (a,b) + R, where F’ = f at all points of (a,b), has the intermediate 


12.3. Integrals and sequences of functions 251 


Figure 2. Graphs of the Fresnel integral y = S(x) (blue) and its derivative 


y = sin(ma?/2) (green) from Example|12.16 


value property. That is, for all c, d such that if a < c < d < b and all y between 
f(c) and f(d), there exists an x between c and d such that f(x) = y. A continuous 
derivative has this property by the intermediate value theorem, but a discontinuous 
derivative also has it. Thus, discontinuous functions without the intermediate value 
property, such as ones with a jump discontinuity, don’t have an antiderivative. For 
example, the function F in Example[12.7]is not an antiderivative of the step function 
f on R since it isn’t differentiable at 0. 


In dealing with functions that are not continuously differentiable, it turns out 
to be more useful to abandon the idea of a derivative that is defined pointwise 
everywhere (pointwise values of discontinuous functions are somewhat arbitrary) 
and introduce the notion of a weak derivative. We won’t define or study weak 
derivatives here. 


12.3. Integrals and sequences of functions 


A fundamental question that arises throughout analysis is the validity of an ex- 
change in the order of limits. Some sort of condition is always required. 


In this section, we consider the question of when the convergence of a sequence 
of functions fn —> f implies the convergence of their integrals f fa + f f. Here, 
we exchange a limit of a sequence of functions with a limit of the Riemann sums 
that define their integrals. The two types of convergence we'll discuss are pointwise 
and uniform convergence, which are defined in Chapter [9] 


As we show first, the Riemann integral is well-behaved with respect to uniform 
convergence. The drawback to uniform convergence is that it’s a strong form of 
convergence, and we often want to use a weaker form, such as pointwise convergence, 
in which case the Riemann integral may not be suitable. 


252 12. Properties and Applications of the Integral 


12.3.1. Uniform convergence. The uniform limit of continuous functions is 
continuous and therefore integrable. The next result shows, more generally, that 
the uniform limit of integrable functions is integrable. Furthermore, the limit of 
the integrals is the integral of the limit. 


Theorem 12.17. Suppose that fn : [a,b] —> R is Riemann integrable for each 
n € N and fn > f uniformly on [a,b] as n > co. Then f : [a,b] > R is Riemann 


integrable on [a, b] and 
b b 
7 aR es 


Proof. The main statement we need to prove is that f is integrable. Let e > 0. 
Since fn —> f uniformly, there is an N € N such that ifn > N then 


fale) - > < f(x) < fala) +7 for alla < z < b. 


It follows from Proposition |11.39] that 


(m-i) uisu(m+ 55). 


Since f, is integrable and upper integrals are greater than lower integrals, we get 


that 
EE <U(f jsf hte 


for all n > N, which implies that 
0 < U(f) — L(f) < 2e. 


Since € > 0 is arbitrary, we conclude that L(f) = U(f), so f is integrable. Moreover, 
it follows that for all n > N we have 


b 
fa — 
a 


f| <€, 


which shows that f? fn f? f asn —> o. 


Alternatively, once we know that the uniform limit of integrable functions is 
integrable, the convergence of the integrals follows directly from the estimate 


TS z aap 


Example 12.18. The function fn : [0,1] > R defined by 


< sup |fn(x) — f(x)|- (b—a)—> 0 as n + oo. 
x€ [a,b] 


n + COS T 
In) = ne? + sing 
converges uniformly on [0,1] to f(x) = e7” since, for 0 < x < 1, 
n+cost — p| _ [coss — e * sing é 1 
ne? + sina ne” + sin x Tn 


It follows that 


1 1 

cos 1 

im f ee = | e *dr=1--. 
noo Jo ne” + sine 0 e 


12.3. Integrals and sequences of functions 253 


Example 12.19. Every power series 
f(x) = ao +042 +079? +++ + ane +... 


with radius of convergence R > 0 converges uniformly on compact intervals inside 
the interval |z| < R, so we can integrate it term-by-term to get 


! f(t) dt = aoz + gma’ + gaaat? +t o 


E for |z| < R. 


Example 12.20. If we integrate the geometric series 


1 
=l1+rtr’ tetr H for |x| < 1, 


l-z 
we get a power series for log, 


1 1 1 
toe (5 ) =z r? +r eta 4... for |z| < 1. 
-g 


2 3 n 


For instance, taking x = 1/2, we get the rapidly convergent series 


“1 
log2 = —— 


n2” 
n=1 


for the irrational number log 2 ~ 0.6931. This series was known and used by Euler. 
For comparison, the alternating harmonic series in Example also converges 
to log 2, but it does so extremely slowly and would be a poor choice for computing 
a numerical approximation. 


Although we can integrate uniformly convergent sequences, we cannot in gen- 
eral differentiate them. In fact, it’s often easier to prove results about the conver- 
gence of derivatives by using results about the convergence of integrals, together 
with the fundamental theorem of calculus. The following theorem provides suffi- 
cient conditions for fa —> f to imply that ff > f’. 


Theorem 12.21. Let fa : (a,b) + R be a sequence of differentiable functions 
whose derivatives f’, : (a,b) — R are integrable on (a,b). Suppose that fn > f 
pointwise and ff — g uniformly on (a,b) as n — oo, where g : (a,b) > R is 
continuous. Then f : (a,b) > R is continuously differentiable on (a,b) and f’ = g. 


Proof. Choose some point a < c < b. Since f! is integrable, the fundamental 
theorem of calculus, Theorem implies that 


Laan +f E Giger 


Since fa > f pointwise and f’ — g uniformly on [a,x], we find that 


Since g is continuous, the other direction of the fundamental theorem, Theo- 
rem implies that f is differentiable in (a,b) and f’ = g. 


254 12. Properties and Applications of the Integral 


In particular, this theorem shows that the limit of a uniformly convergent se- 
quence of continuously differentiable functions whose derivatives converge uniformly 
is also continuously differentiable. 


The key assumption in Theorem [12.21] is that the derivatives ff converge uni- 
formly, not just pointwise; the result is false if we only assume pointwise convergence 
of the f’. In the proof of the theorem, we only use the assumption that f,,(a) con- 
verges at a single point x = c. This assumption together with the assumption that 
fi, > g uniformly implies that fn > f uniformly, where 


n— co 


f(x) = lim fale) + fo 


Thus, the theorem remains true if we replace the assumption that fn —> f pointwise 
on (a,b) by the weaker assumption that lim,_... fn(c) exists for some c € (a,b). 
This isn’t an important change, however, because the restrictive assumption in the 
theorem is the uniform convergence of the derivatives fi, not the pointwise (or 
uniform) convergence of the functions fn. 


The assumption that g = lim f/, is continuous is needed to show the differentia- 
bility of f by the fundamental theorem, but the result remains true even if g isn’t 
continuous. In that case, however, a different — and more complicated — proof is 
required, which is given in Theorem [9.18] 


12.3.2. Pointwise convergence. On its own, the pointwise convergence of 
functions is never sufficient to imply convergence of their integrals. 


Example 12.22. For n € N, define fn : [0,1] > R by 


aoe i if0<a<1/n, 


0 if#=Oorl/n<a<1l. 


Then fn > 0 pointwise on [0,1] but 


for every n € N. By slightly modifying these functions to 


n? if0<2<1/n, 
oa 7 
0 if*=O0orl/n<a<1, 


we get a sequence that converges pointwise to 0 but whose integrals diverge to oo. 
The fact that the f, are discontinuous is not important; we could replace the step 
functions by continuous “tent” functions or smooth “bump” functions. 


The behavior of the integral under pointwise convergence in the previous ex- 
ample is unavoidable whatever definition of the integral one uses. A more serious 
defect of the Riemann integral is that the pointwise limit of Riemann integrable 
functions needn’t be Riemann integrable at all, even if it is bounded. 


12.4. Improper Riemann integrals 255 


Example 12.23. Let {rp : k € N} be an enumeration of the rational numbers in 
(0, 1] and define fn : [0,1] > R by 


1 ifs = rpg for some 1 < k <n, 
fal )= R 
0 otherwise. 


Then each fn is Riemann integrable since it differs from the zero function at finitely 
many points. However, fn > f pointwise on |0, 1] to the Dirichlet function f, which 
is not Riemann integrable. 


This is another place where the Lebesgue integral has better properties than 
the Riemann integral. The pointwise (or pointwise almost everywhere) limit of 
Lebesgue measurable functions is Lebesgue measurable. As Example [12.22] shows, 
we still need conditions to ensure the convergence of the integrals, but there are 
quite simple and general conditions for the Lebesgue integral (such as the monotone 
convergence and dominated convergence theorems). 


12.4. Improper Riemann integrals 


The Riemann integral is only defined for a bounded function on a compact interval 
(or a finite union of such intervals). Nevertheless, we frequently want to integrate 
unbounded functions or functions on an infinite interval. One way to interpret 
such integrals is as a limit of Riemann integrals; these limits are called improper 
Riemann integrals. 


12.4.1. Improper integrals. First, we define the improper integral of a function 
that fails to be integrable at one endpoint of a bounded interval. 


Definition 12.24. Suppose that f : (a,b] —> R is integrable on [c,b] for every 
a<c<b. Then the improper integral of f on [a,b] is 


b b 
= li 2 
[t=am fos 


The improper integral converges if this limit exists (as a finite real number), other- 
wise it diverges. Similarly, if f : [a,b) — R is integrable on [a,c] for every a < c < b, 


then 
b b—e 


f= lim f- 


a e>0t Ja 


We use the same notation to denote proper and improper integrals; it should 
be clear from the context which integrals are proper Riemann integrals (i.e., ones 


given by Definition|11.11) and which are improper. If f is Riemann integrable on 


[a,b], then Proposition |11.50| shows that its improper and proper integrals agree, 
but an improper integral may exist even if f isn’t integrable. 


Example 12.25. If p > 0, then the integral 


1 
1 

| — dx 
o «xP 


256 12. Properties and Applications of the Integral 


isn’t defined as a Riemann integral since 1/2? is unbounded on (0, 1]. The corre- 
sponding improper integral is 


1 1 
1 1 

I — dx = lim — dz. 
o 2P e>0t Je xP 


1 : 
1 l-eP 
/ dx = a ; 
€ xP 1- p 
so the improper integral converges if 0 < p < 1, with 


1 
1 1 
d 


For p Æ 1, we have 


— dz = —, 
0 xP p—1 


and diverges to oo if p > 1. The integral also diverges (more slowly) to co if p = 1 


since 
1 
1 1 
fi — dz = log -. 
et € 


Thus, we get a convergent improper integral if the integrand 1/x? does not grow 
too rapidly as x — OT (slower than 1/2). 


We define improper integrals on an unbounded interval as limits of integrals on 
bounded intervals. 


Definition 12.26. Suppose that f : [a,co) —> R is integrable on [a,r] for every 
r >a. Then the improper integral of f on [a, oo) is 


/ f= lim | F. 
a Too a 


Similarly, if f : (—oo, b] > R is integrable on [r, b] for every r < b, then 


b b 
/ f= lim f. 
as roo J -r 


Let’s consider the convergence of the integral of the power function in Exam- 
ple|12.25|at infinity rather than at zero. 


Example 12.27. Suppose that p > 0. The improper integral 


a ”1 l-p— 1 
| — dx = lim — dz = lim (=) 
1 xP r=>œ fy xP r= oo 1— p 


converges to 1/(p — 1) if p > 1 and diverges to oo if 0 < p < 1. It also diverges 
(more slowly) if p = 1 since 


ai = 
| — dx = lim — dx = lim logr = oo. 
1 


x Toco 1 T r= oo 
Thus, we get a convergent improper integral if the integrand 1/z? decays sufficiently 
rapidly as x — oo (faster than 1/2). 


A divergent improper integral may diverge to oo (or —oo) as in the previous 
examples, or — if the integrand changes sign — it may oscillate. 


12.4. Improper Riemann integrals 257 


Example 12.28. Define f : [0, o0) —> R by 
f(x) = (-1)” for n <x <n+1 where n=0,1,2,.... 
Then 0 < fj f <1 and 


n 1 ifn is an odd integer, 
TE a] ! 
0 if n is an even integer. 


Thus, the improper integral i f doesn’t converge. 


More general improper integrals may be defined as finite sums of improper 
integrals of the previous forms. For example, if f : [a,b] \ {c} > R is integrable on 
closed intervals not including a < c < b, then 


b c—é b 
= lim + lim ; 
i f 50+ i f e>0t Jepe f; 


and if f : R > R is integrable on every compact interval, then 


Í f= lim | f+ lim Í, 
>w soo Js roo Je 


where we split the integral at an arbitrary point c € R. Note that each limit is 
required to exist separately. 


Example 12.29. The improper PE integral 
j 1 
f- = de = lim Lg lim ae 
0 e30F roo xP 
does not converge for any p € R, since ce integral ae diverges at 0 (if p > 1) or 
at infinity (if p < 1). 


Example 12.30. If f : [0,1] — R is continuous and 0 < c < 1, then we define as 
an improper integral 
1 c—ô 1 
| NE s lim aid act lim INT) ay, 
|x — c|1/2 60+ Jo = |x — e|!/2 «30+ Jepe |£ — e|!/? 


Example 12.31. Consider the following integral, called a Frullani integral, 


_ f* flax) = Flee) 4, 
pa e g 


where 0 < a < b and f : [0,00) > R is a continuous function whose limit as £ > oo 
exists. We write this limit as 


f(co) = lim f(a). 


£r— 00O 


We interpret the integral as an eae integral J = Iı + Ig where 


h= im f AA wy £, a N “ane 
e>0+ Too 


Consider I4. After making the substitutions s = ax and t = bx and using the 
additivity property of the integral, we get that 


f(s ° f(t) d F(t) ” f(t) 
n= im (f° ae J t at) = tm. (f t i) f t 


258 12. Properties and Applications of the Integral 


To evaluate the limit, we write 


eb 10 a eb f(t )— f0 fO-fO 4, 7 fO) fia 


€a EQ t 


i tH -40 dt + f (0) log (2) i 


EQ t 


Since f is continuous at 0 and t > ea in the interval of integration of length e(b— a), 
we have 


b 


TAO ATO) 2 < (=) -max{|f(t) — f(0)| : ca < t < eb} 50 


€a t 


as € > 0+. It follows that 


I, = f(0) log (2) - fa 


A similar argument gives 


l=—f(oo jog (2) + i dt. 


Adding these results, we conclude that 


ie EuL im dæ = {f (0) — f(oo)} log (2) . 


12.4.2. Absolutely convergent improper integrals. The convergence of im- 
proper integrals is analogous to the convergence of series. A series X` a, converges 
absolutely if X` |an| converges, and conditionally if X` an converges but >> |a,| di- 
verges. We introduce a similar definition for improper integrals and provide a 
test for the absolute convergence of an improper integral that is analogous to the 
comparison test for series. 


Definition 12.32. An improper integral f? f is absolutely convergent if the im- 
proper integral f? |f| converges, and conditionally convergent if f? f converges but 


f? |f| diverges. 


As part of the next theorem, we prove that an absolutely convergent improper 
integral converges (similarly, an absolutely convergent series converges). 


Theorem 12.33. Suppose that f,g : I — R are defined on some finite or infinite 
interval 7. If |f| < g and the improper integral f, g converges, then the improper 
integral f z f converges absolutely. Moreover, an absolutely convergent improper 
integral converges. 


Proof. To be specific, we suppose that f, g : [a, o0) > R are integrable on [a,r] for 
r > a and consider the improper integral 


Ca 


A similar argument applies to other types of improper integrals. 


12.4. Improper Riemann integrals 259 


First, suppose that f > 0. Then 


[iea] a 


so S f is a monotonic increasing function of r that is bounded from above. There- 
fore it converges as r — oo. 


In general, we decompose f into its positive and negative parts, 


F= f+- f-, Ifl = f+ + f-, 
f+ = max{ f, 0}, f- = max{—f, 0}. 


We have 0 < f+ < g, so the improper integrals of f+ converge by the previous 
argument, and therefore so does the improper integral of f: 


Prem (rL 


= lim f+— lim pa 


TCO Too 


= nf fe. 


Moreover, since 0 < f+ < |f|, we see that f° f} and f° f- both converge if 
SZ |f| converges, and therefore so does f° f, so an absolutely convergent improper 
integral converges. 


Example 12.34. Consider the limiting behavior of the error function erf(a) in 
Example }|12.15}as x — oo, which is given by 


2 i’ -2 g 2 li £ -2 J 

— e x= — lim e x. 

VT Jo VT r= Jo 
The convergence of this improper integral follows by comparison with e 
example, since 


=v for 


2 


0<e”* <e” for x > 1, 
and 


/ e dx = lim e “dx = lim (et — er) = 1 
1 


r= Jy r= oo B 


This argument proves that the error function approaches a finite limit as £ > oo, 
but it doesn’t give the exact value, only an upper bound 


> f eam E 2 (i+! 
ide aS IM, = Vr w M eg S a Ea 


One can evaluate this improper integral exactly, with the result that 


2 [ -2 4 1 
— e r=. 
VT Jo 


260 12. Properties and Applications of the Integral 


0.4} : 


0.3 F \ 


0.1 


Figure 3. Graph of y = (sin z)/(1 + 


x?) from Example|12.35| The dashed 
green lines are the graphs of y = 4 2 


t1/a*. 


The standard trick to obtain this result (apparently introduced by Laplace) uses 
double integration, polar coordinates, and the substitution u = r°: 


oo) 2 co poo 
(/ e7 de) =f Í e 7” ~Y dady 
0 0 0 
m/2 oo r 
=f (/ e” rar) dé 
0 0 


m f2 T 
| a 


This formal computation can be justified rigorously, but we won’t do so here. There 
are also many other ways to obtain the same result. 


Example 12.35. The improper integral 


oo : r i 
sin x ; sin x 
[i dx = lim dx 
0 


14+ 2? roo Jo 1+ x? 
converges absolutely, since 


oe) : 1 : oo r 
J sin z dz = | sin x dz 4 / sin z dz 
o l+2? o L+2? 1 


1+2? 
and (see Figure |8) 


sin xz 
142? 


1 ~ 1 
<— forr>1, — dx < o. 
x 1 a 


The value of this integral doesn’t have an elementary expression, but by using 
contour integration from complex analysis one can show that 


°° sing 1 e 
dx = — Fi(1) — Ê Ei(—1) ~ 0.6468 
| er ee oe 


12.5. * Principal value integrals 261 


where Ei is the exponential integral function defined in Example}12.41 


Improper integrals, and the principal value integrals discussed below, arise 
frequently in complex analysis, and many such integrals can be evaluated by contour 
integration. 


Example 12.36. The improper integral 


“sing | li "sine y T 
x= lim g= 
0 x r>œ jo £ 2 


converges conditionally. We leave the proof as an exercise. Note that there is no 
difficulty at 0, since sing/x — 1 as x — 0, and comparison with the function 
1/x doesn’t imply absolute convergence at infinity because the improper integral 
JE 1/a da diverges. There are many ways to show that the exact value of the 
improper integral is 7/2. The standard method uses contour integration. 


Example 12.37. Consider the limiting behavior of the Fresnel sine function S(x) 
in Example|12.16]as x — oo. The improper integral 


o0 2 F 2 1 
/ sin a dx = lim sin a dx = —. 
0 2 r= oo 0 2 2 


converges conditionally. This example may seem surprising since the integrand 
sin(7x?/2) doesn’t converge to 0 as x + oo. The explanation is that the integrand 
oscillates more rapidly with increasing x, leading to a more rapid cancelation be- 
tween positive and negative values in the integral (see Figure [2). The exact value 
can be found by contour integration, again, which shows that 


fs (=) d 1 f: ( =) d 
in ( — ] dx = — xp | ——— } dz. 
0 2 V2 Jo j 2 


Evaluation of the resulting Gaussian integral gives 1/2. 


12.5. * Principal value integrals 


Some integrals have a singularity that is too strong for them to converge as im- 
proper integrals but, due to cancelation between positive and negative parts of the 
integrand, they have a finite limit as a principal value integral. We begin with an 
example. 


Example 12.38. Consider f : |-1,1] \ {0} defined by 


The definition of the integral of f on [—1, 1] as an improper integral is 


1 =) 1 
/ lia lim ti + lim San 


1 v 60F J_y T e>0t Je T 


= lim logô-— lim loge. 
60+ e>0t+ 


262 12. Properties and Applications of the Integral 


Neither limit exists, so the improper integral diverges. (Formally, we get co — oo.) 
If, however, we take 6 = e and combine the limits, we get a convergent principal 
value integral, which is defined by 


1 —e 1 
1 1 1 
pv. f —dx= lim (J æ f ax) = lim (loge — loge) = 0. 
_1 2 e= 0+ _1 i e T e> 0t 


The value of 0 is what one might expect from the oddness of the integrand. A 
cancelation in the contributions from either side of the singularity is essential to 
obtain a finite limit. 

The principal value integral of 1/x on a non-symmetric interval about 0 still 
exists but is non-zero. For example, if b > 0, then 


b —e b 
1 1 1 
pv. f —dx = lim (/ La f i) = lim (loge + logb — loge) = log b. 
1 T e> 0t =1 x € T e> 0t 


The crucial feature of a principal value integral is that we remove a symmetric 
interval around a singular point, or infinity. The resulting cancelation in the integral 
of a non-integrable function that changes sign across the singularity may lead to a 
finite limit. 


Definition 12.39. If f : [a,b] \ {c} —> R is integrable on closed intervals not 
including a < c < b, then the principal value integral of f on [a,b] is 


vo [ream a) 


If f : R > R is integrable on compact intervals, then the principal value integral of 
f on Ris 


p.v. ae = lim E 
TCO 
If the improper integral exists, then the principal value integral exists and 
is equal to the improper integral. As Example shows, the principal value 
integral may exist even if the improper integral does not. Of course, a principal 
value integral may also diverge. 


Example 12.40. Consider the principal value integral 


1 —e L 
1 : 1 1 
pw. | z= tim (f 5+ f 5d) 


In this case, the function 1/æ? is positive and approaches oo on both sides of the 
singularity at x = 0, so there is no cancelation and the principal value integral 
diverges to oo. 


Principal value integrals arise frequently in complex analysis, harmonic analy- 
sis, and a variety of applications. 


12.5. * Principal value integrals 263 


—2+ 


Figure 4. Graphs of the exponential integral y = Ei(x) (blue) and its deriv- 


ative y = e7 /x (green) from Example]12.41 


Example 12.41. The exponential integral Ei is a non-elementary function defined 


by 
x et 


=— 60 

Its graph is shown in Figure[4| This integral has to be understood, in general, as an 
improper, principal value integral, and the function has a logarithmic singularity 
at x = 0. 

If x < 0, then the integrand is continuous for —oo < t < x, and the integral is 
interpreted as an improper integral, 

ne A T ot 

1 f dt= lim | ~ dt. 

t t 


oo Too -r 


This improper integral converges absolutely by comparison with et, since 


<e for —oo < t < —1, 


and 


8 TPO fp r= o0 e 


—1 -1 l 
J edt = lim e' dt = lim (e7" — e7!) zA 


If x > 0, then the integrand has a non-integrable singularity at t = 0, and we 
interpret it as a principal value integral. We write 


ic t = Ut x ot 
/ “u= f Cas f € tt. 
og t ee t —ı t 


264 12. Properties and Applications of the Integral 


The first integral is interpreted as an improper integral as before. The second 
integral is interpreted as a principal value integral 


x t =E st x ot 
e 
pv. f © dt= lim J cat f — dt]. 
=i. t e= 0+ = i t € t 
This principal value integral converges, since 


iT 3 z ot _y ry z t—1 
pv. f saa f d+ pv. | TE f dt + log x. 
hase TEE E ae 


The first integral makes sense as a Riemann integral since the integrand has a 
removable singularity at t = 0, with 


so it extends to a continuous function on [—1, <]. 

Finally, if = 0, then the integrand is unbounded at the left endpoint t = 0. 
The corresponding improper integral diverges, and Ei(0) is undefined. 

The exponential integral arises in physical applications such as heat flow and 
radiative transfer. It is also related to the logarithmic integral 


€ dt 
ie I a 
o logt 


by li(x) = Ei(logx). The logarithmic integral is important in number theory, 
where it gives an asymptotic approximation for the number of primes less than x 
as £ — 00. 


Example 12.42. Let f : R —> R and assume, for simplicity, that f has compact 
support, meaning that f = 0 outside a compact interval [—r,r]. If f is integrable, 
we define the Hilbert transform Hf :R—-— R of f by the principal value integral 


10 a= im Oas) 20a) 


£ T e>0t+ \J-œo T-t ate OE 


TAC “pw. T 


Here, x plays the role of a parameter in the integral with respect to t. We use 
a principal value because the integrand may have a non-integrable singularity at 
t = x. Since f has compact support, the intervals of integration are bounded and 
there is no issue with the convergence of the integrals at infinity. 


For example, suppose that f is the step function 


f 


(x) 1 forO<2<1, 
Ti = 
0 frr<0orzr>l. 


If x <0 or x> 1, then t Æ az for 0< t< 1, and we get a proper Riemann integral 


xz—t 


wi 1 
H f(x) = - | dt = — log 
0 T 


x 
x-1 


12.6. The integral test for series 265 


If0 <a <1, then we get a principal value integral 


oe 21 1f 1 
Hf(x) = t lim d+- dt 
T «0+ \Jo cat T Jate tt 


= E lim o ) + tos (75 S) 
T e>0T 


TT 


The principal value integral with respect to t diverges if x = 0,1 because f(t) has 
a jump discontinuity at the point where t = x. Consequently the values Hf (0), 
Hf (1) of the Hilbert transform of the step function are undefined. 


12.6. The integral test for series 


An a further application of the improper integral, we prove a useful test for the 
convergence or divergence of a monotone decreasing, positive series. The idea is to 
interpret the series as an upper or lower sum of an integral. 


Theorem 12.43 (Integral test). Suppose that f : [1,co) — R is a positive de- 
creasing function (i.e., 0 < f(x) < f(y) for x > y). Let an = f(n). Then the 
series 


converges if and only if the improper integral 


i ” f(a) de 


converges. Furthermore, the limit 


exists, and 0 < D < ay. 
Proof. Let 7 
Sn = 5 Gk; = (x) da. 
k=1 1 
The integral Tn exists since f is monotone, and the sequences (Sn), (Tn) are in- 
creasing since f is positive. 
Let 
Py = {[1, 2], [2,3],..., [n — 1, n]} 
be the partition of [1,n] into n — 1 intervals of length 1. Since f is decreasing, 


sup f = ak, ii. = = ük+1, 
[k,k+1] [k,k+ 


266 12. Properties and Applications of the Integral 


and the upper and lower sums of f on P, are given by 


n—1 


U(f; Pa) = 2% Lf; Pa) = Sam 
Since the integral of f on [1, n] is bounded by its upper and lower sums, we get that 
Sn — ay < Tn < Sni 


This inequality shows that (Tn) is bounded from above by S if Sn ¢ S, and (Sn) 
is bounded from above by T + ay if T, T T, so (Sn) converges if and only if (Tan) 
converges, which proves the first part of the theorem. 


Let Dn = Sn— Tn. Then the inequality shows that a, < Dn < aj; in particular, 
(Dn) is bounded from below by zero. Moreover, since f is decreasing, 


n+1 
Da- Dan = f f(x) dx — an4ı > f(n +1): 1-— an+ı =0, 


so (Dn) is decreasing. Therefore D,, | D where 0 < D < a, which proves the 
second part of the theorem. 


A basic application of this result is to the p-series. 


Example 12.44. Applying Theorem[|12.43|to the function f(x) = 1/x? and using 
Example |12.27| we find that 

1 

nP 


Me 


n=1 


| A 


converges if p > 1 and diverges if 0 < p < 1. 


Theorem [12.43] is also useful for divergent series, since it tells us how quickly 
their partial sums diverge. We remark that one can obtain similar, but more 
accurate, asymptotic approximations than the one in theorem for the behavior 
of the partial sums in terms of integrals, called the Euler-MacLaurin summation 
formulae. 


Example 12.45. Applying the second part of Theorem |12.43| to the function 
f(x) = 1/x, we find that 


a ed 
a Ds aS | =a 
where the limit 0 < y < 1 is the Euler constant. 


Example 12.46. We can use the result of Example|12.45|to compute the sum A 
of the alternating harmonic series 


12.6. The integral test for series 267 


The partial sum of the first 2m terms is given by 


k=1 k=1 
2m m 
1 1 
= “A, 
k=1 k=1 


Here, we rewrite a sum of the odd terms in the harmonic series as the difference 
between the harmonic series and its even terms, then use the fact that a sum of the 
even terms in the harmonic series is one-half the sum of the series. It follows that 


2m 
lim Ao, = lim J 


Since log 2m — log m = log 2, we get that 


m 


1 
— log 2m — ps L —logm 


k=1 


== 


+ log 2m — lem) ; 


lim AÁə2m = y — y + log 2 = log 2. 
m— oo 
The odd partial sums also converge to log 2 since 


A2m+1 = Á2m + Ti log 2 as m > oo. 


1 
2m + 
Thus, 

(-1 n+1 


fore) ) E 
a T = log 2. 


Example 12.47. A similar calculation to the previous one can be used to to 
compute the sum S of the rearrangement of the alternating harmonic series 
1 1 1 1 1 1 1 1 
S=1 se 
3 4*3 6 8°35 10 12” 
discussed in Example [4.32] The partial sum Sgm of the series may be written in 
terms of partial sums of the harmonic series as 


ETE E ee eee 1 1 
a” a "m—-1 4m—2 4m 
<4 2-1 + 2k 
-5l m a 
~ Luk ok 2k 
k= k=1 k=l 
2m m 2m 
1 1&1 1 
> k D 2 
k= k=1 k=1 


268 12. Properties and Applications of the Integral 


It follows that 


2m 1 1 m 1 1 2m 1 
„Dm Sam = im {5 — log 2m — 5 p T -koem E | T - 12 


k=1 
1 1 
+log2m — = log — = log 2m >. 
2 2 
Since log 2m — 4 log m — 4 log 2m = 4 log 2, we get that that 
1 1 1 1 
i m =Y- =y- -= = log 2 = — log 2. 
TO a a 


Finally, since 
l 


bm (S3m-+1 = Sm) (S3m+2 m Sm) — 0, 


= lim 
m—-oo 


we conclude that the whole series converges to S = i log 2. 


12.7. Taylor’s theorem with integral remainder 


In Theorem [8.46] we gave an expression for the error between a function and its 
Taylor polynomial of order n in terms of the Lagrange remainder, which involves 
a pointwise value of the derivative of order n + 1 evaluated at some intermediate 
point. In the next theorem, we give an alternative expression for the error in terms 
of an integral of the derivative. 


Theorem 12.48 (Taylor with integral remainder). Suppose that f : (a,b) > R 
has n +1 derivatives on (a,b) and f"*! is Riemann integrable on every subinterval 
of (a,b). Let a < c < b. Then for every a < £ < b, 


f£) = f(e) + f'e) (e — c) + OE =e)? ++ TOE- c)” + Rp(x) 
where 


R(x) = af FOTO (E(w — t)” dt. 


Proof. We use proof by induction. The formula is true for n = 0, since the 
fundamental theorem of calculus (Theorem |12.1) implies that 


f(a) =+ | " fi(t)dt = f(c) + a: 


Assume that the formula is true for some n € No and f"*+? is Riemann inte- 
grable. Then, since 


1 d 
nm t n+l 
og aia 8 
an integration by parts with respect to t (Theorem|12.10) implies that 
1 j 1 7 
— | (n+1) i n+1 (n+2) t _+ n+1 dt 
Ral) =- | ah Oe- | +l FDG- 2) 


7 wal Oe = oT! + Rey (2). 


12.7. Taylor’s theorem with integral remainder 269 


Use of this equation in the formula for n gives the formula for n + 1, which proves 
the result. 


By making the change of variable 
t=c+s(x—c), 
we can also write the remainder as 


R,(a) = 2 if fo (64 s(x —c)) (1— 8)" as a-o, 


In particular, if | f+) (x)| < M for a < x < b, then 


1 
Rol M| f a-oa) e- er 
n: 0 
M n 
Z Gai” S T 


which agrees with what one gets from the Lagrange remainder. 


Thus, the integral form of the remainder is as effective as the Lagrange form 
in estimating its size from a uniform bound on the derivative. The integral form 
requires slightly stronger assumptions than the Lagrange form, since we need to 
assume that the derivative of order n+1 is integrable, but its proof is straightforward 
once we have the integral. Moreover, the integral form generalizes to vector-valued 
functions f : (a,b) > R”, while the Lagrange form does not. 


Ss 
Chapter 13 


Metric, Normed, and 
Topological Spaces 


A metric space is a set X that has a notion of the distance d(x, y) between every 
pair of points x,y E€ X. A fundamental example is R with the absolute-value metric 
d(x,y) = |x — y|, and nearly all of the concepts we discuss below for metric spaces 
are natural generalizations of the corresponding concepts for R. 


A special type of metric space that is particularly important in analysis is a 
normed space, which is a vector space whose metric is derived from a norm. On 
the other hand, every metric space is a special type of topological space, which is 
a set with the notion of an open set but not necessarily a distance. 

The concepts of metric, normed, and topological spaces clarify our previous 
discussion of the analysis of real functions, and they provide the foundation for 
wide-ranging developments in analysis. The aim of this chapter is to introduce 
these spaces and give some examples, but their theory is too extensive to describe 
here in any detail. 


13.1. Metric spaces 


A metric on a set is a function that satisfies the minimal properties we might expect 
of a distance. 


Definition 13.1. A metric d on a set X is a function d: X x X — R such that 
for all x,y,z € X: 


(1) d(x,y) > 0 and d(x,y) = 0 if and only if x = y (positivity); 
(2) d(x,y) = d(y,x) (symmetry); 
(3) d(x,y) < d(x,z) + d(z,y) (triangle inequality). 


A metric space (X,d) is a set X with a metric d defined on X. 


271 


272 13. Metric, Normed, and Topological Spaces 


In general, many different metrics can be defined on the same set X, but if the 
metric on X is clear from the context, we refer to X as a metric space. 


Subspaces of a metric space are subsets whose metric is obtained by restricting 
the metric on the whole space. 


Definition 13.2. Let (X, d) be a metric space. A metric subspace (A, d4) of (X,d) 
consists of a subset A C X whose metric d4 : A x A —> R is is the restriction of d 
to A; that is, d4(x,y) = d(x,y) for all x,y € A. 


We can often formulate intrinsic properties of a subset A C X of a metric space 
X in terms of properties of the corresponding metric subspace (A, d4). 


When it is clear that we are discussing metric spaces, we refer to a metric 
subspace as a subspace, but metric subspaces should not be confused with other 
types of subspaces (for example, vector subspaces of a vector space). 


13.1.1. Examples. In the following examples of metric spaces, the verification 
of the properties of a metric is mostly straightforward and is left as an exercise. 


Example 13.3. A rather trivial example of a metric on any set X is the discrete 


metric 
0 ifr=y, 
d(x,y) = . 
1 ifa@Ffy. 
This metric is nevertheless useful in illustrating the definitions and providing counter- 


examples. 
Example 13.4. Define d: R x R > R by 
d(x,y) = |x — yl. 


Then d is a metric on R. The natural numbers N and the rational numbers Q with 
the absolute-value metric are metric subspaces of R, as is any other subset A C R. 


Example 13.5. Define d : R? x R? > R by 
d(x,y) = [zı — yı| + |z2 —-y2| æ= (z1,£2), y= (y1, Y2). 


Then d is a metric on R?, called the ¢' metric. (Here, “21” is pronounced “ell-one.” ) 
For example, writing z = (21, z2), we have 


d(x, y) = |£1 — 21 + 21 — yr| + |£2 — 22 + 22 — yo! 


< |e1 — z1| + [z1 — y| + [v2 — 22| + |22 — yp 
< d(x, z) + d(z,y), 
so d satisfies the triangle inequality. This metric is sometimes referred to infor- 


mally as the “taxicab” metric, since it’s the distance one would travel by taxi on a 
rectangular grid of streets. 


Example 13.6. Define d : R? x R? => R by 


d(x,y) = y (z1 — y1)? + (z2 = y2) x= (z1, £2), y = (yr, Y2). 
Then d is a metric on R?, called the Euclidean, or ¢?, metric. It corresponds to 
the usual notion of distance between points in the plane. The triangle inequality 


is geometrically obvious but an analytical proof is non-trivial (see Theorem (13.26 
below). 


13.1. Metric spaces 273 


Figure 1. The graph of a function f € C((0,1]) is in blue. A function whose 
distance from f with respect to the sup-norm is less than 0.1 has a graph that 
lies inside the dotted red lines y = f(x) + 0.1 e.g., the green graph. 


Example 13.7. Define d : R? x R? > R by 
d(x,y) = max (|z1 — yıl, |£2 — yal) x = (z1,%2), y = (y1, Y2). 
Then d is a metric on R?, called the 4%, or maximum, metric. 
Example 13.8. Define d : R? x R? > R for x = (x1, £2), y = (y1, y2) as follows: if 
(£1, 2) £ k(y1, y2) for k € R, then 
d(x,y) = yz? +03 + VR + 93; 


and if (x1, £2) = k(y1, y2) for some k € R, then 


d(z,y) = (ar — 1)? + (22 — yp). 


That is, d(x,y) is the sum of the Euclidean distances of x and y from the origin, 
unless x and y lie on the same line through the origin, in which case it is the 
Euclidean distance from x to y. Then d defines a metric on R?. 


In England, d is sometimes called the “British Rail” metric, because all the 
train lines radiate from London (located at 0). To take a train from town x to town 
y, one has to take a train from x to 0 and then take a train from 0 to y, unless x 
and y are on the same line, when one can take a direct train. 


Example 13.9. Let C(K) denote the set of continuous functions f : K > R, 
where K C R is compact; for example, K = [a,b] is a closed, bounded interval. If 
f.g € C(K) define 


d(f,g) = sup |f (x) — 9(x)| =f —glloo, Ilfllo = sup |f(2)|. 
cEek cek 


274 13. Metric, Normed, and Topological Spaces 


0.5} 


-1.5 -1 -0.5 0 0.5 1 1.5 


Figure 2. The unit balls Bı(0) on R? for different metrics: they are the 
interior of a diamond (€!-norm), a circle (¢?-norm), or a square (€°-norm). 
The ¢°°-ball of radius 1/2 is also indicated by the dashed line. 


The function d : C(K) x C(K) > R is well-defined, since a continuous function on 
a compact set is bounded, and d is a metric on C(K). Two functions are close with 
respect to this metric if their values are close at every point x € K. (See Figure Ep 
We refer to || flle as the sup-norm of f. Section [13.6] has further discussion. 


13.1.2. Open and closed balls. A ball in a metric space is analogous to an 
interval in R. 


Definition 13.10. Let (X,d) be a metric space. The open ball B,(a) of radius 

r > 0 and center x € X is the set of points whose distance from z is less than r, 
B,(x) ={ye X : d(x,y) <r}. 

The closed ball B,(a) of radius r > 0 and center x € X as the set of points whose 

distance from z is less than or equal to r, 


B,(x) = {y € X : d(x, y) <r}. 


The term “ball” is used to denote a “solid ball,” rather than the “sphere” of 
points whose distance from the center x is equal to r. 


Example 13.11. Consider R with its standard absolute-value metric, defined in 
Example[13.4] Then the open ball B, (x£) = {y € R : |x — y| < r} is the open interval 
of radius r centered at x, and the closed ball B,(x) = {y € R : |æ — y| < r} is the 
closed interval of radius r centered at x. 


13.1. Metric spaces 275 


Example 13.12. For R? with the Euclidean metric defined in Example [13.6] the 
ball B,(x) is an open disc of radius r centered at x. For the ('-metric in Exam- 
ple the ball is a diamond of diameter 2r, and for the @°°-metric in Exam- 
ple it is a square of side 2r. The unit ball B,(0) for each of these metrics is 
illustrated in Figure [2] 


Example 13.13. Consider the space C(K) of continuous functions f : K > R 
on a compact set K C R with the sup-norm metric defined in Example [13.9] The 
ball B,(f) consists of all continuous functions g : K — R whose values are within 
r of the values of f at every x € K. For example, for the function f shown in 
Figure [I] with r = 0.1, the open ball B,(f) consists of all continuous functions g 
whose graphs lie between the red lines. 


One has to be a little careful with the notion of balls in a general metric space, 
because they don’t always behave the way their name suggests. 


Example 13.14. Let X be a set with the discrete metric given in Example [13.3] 
Then B,(x) = {x} consists of a single point if 0 < r < 1 and B,(x) = X is the 
whole space if r > 1. (See also Example|13.44}) 


An another example, what are the open balls for the metric in Example|13.8/ 
A set in a metric space is bounded if it is contained in a ball of finite radius. 


Definition 13.15. Let (X,d) be a metric space. A set A C X is bounded if there 
exist x € X and 0 < R < œ such that d(x,y) < R for all y € A, meaning that 
AC Br(2). 


Unlike R, or a vector space, a general metric space has no distinguished origin, 
but the center point of the ball is not important in this definition of a bounded set. 
The triangle inequality implies that d(y, z) < R + d(x,y) if d(x,z) < R, so 


Br(x) C Be (y) for R' = R + d(x,y). 


Thus, if Definition|13.15|holds for some x € X, then it holds for every x € X. 


We can say equivalently that A C X is bounded if the metric subspace (A, dA) 
is bounded. 


Example 13.16. Let X be a set with the discrete metric given in Example [13.3] 
Then X is bounded since X = B, (x) ifr >1landaeX. 


Example 13.17. A subset A C R is bounded with respect to the absolute-value 
metric if A C (—R, R) for some 0 < R < co. 


Example 13.18. Let C(K) be the space of continuous functions f : K > R on a 
compact set defined in Example[13.9} The set F C C(K) of all continuous functions 
f : K > R such that |f (x)| < 1 for every x € K is bounded, since d(f,0) = || fllo < 
1 for all f € F. The set of constant functions {f : f(x) = c for all z € K} isn’t 
bounded, since || f||.. = |c| may be arbitrarily large. 


We define the diameter of a set in an analogous way to Definition[3.5]for subsets 
of R. 


276 13. Metric, Normed, and Topological Spaces 


Definition 13.19. Let (X,d) be a metric space and A C X. The diameter 0 < 
diam A < oo of A is 


diam A = sup {d(z,y): x,y € A}. 


It follows from the definitions that A is bounded if and only if diam A < oo. 


The notions of an upper bound, lower bound, supremum, and infimum in R 
depend on its order properties. Unlike properties of R based on the absolute value, 
they do not generalize to an arbitrary metric space, which isn’t equipped with an 
order relation. 


13.2. Normed spaces 


In general, there are no algebraic operations defined on a metric space, only a 
distance function. Most of the spaces that arise in analysis are vector, or linear, 
spaces, and the metrics on them are usually derived from a norm, which gives the 
“length” of a vector. 

We assume that the reader is familiar with the basic theory of vector spaces, 
and we consider only real vector spaces. 


Definition 13.20. A normed vector space (X, || - ||) is a vector space X together 
with a function ||- || : X — R, called a norm on X, such that for all x,y € X and 
KER: 

(1) 0 < ||z|| < œ and ||a|| = 0 if and only if x = 0; 

(2) [[A2|| = |Al lla; 

(3) lla + yll < lll] + lly- 


The properties in Definition [I3.20]are natural ones to require of a length. The 
length of x is 0 if and only if x is the 0-vector; multiplying a vector by a scalar 
k multiplies its length by |k|; and the length of the “hypoteneuse” x + y is less 
than or equal to the sum of the lengths of the “sides” x, y. Because of this last 
interpretation, property (3) is called the triangle inequality. We also refer to a 
normed vector space as a normed space for short. 


Proposition 13.21. If (X,|| - ||) is a normed vector space, then d : X x X > R 
defined by d(x, y) = ||z — y|| is a metric on X. 

Proof. The metric-properties of d follow directly from the properties of a norm in 
Definition |13.20| The positivity is immediate. Also, we have 


d(x,y) = lle = yll = || = (@— yl = lly = all = dy, x), 
d(x,y) = ||x = z +z = yl| < lle = z|| + ||z = yl| = d(x, z) + dy, 2), 


which proves the symmetry of d and the triangle inequality. 


If X is a normed vector space, then we always use the metric associated with 
its norm, unless stated specifically otherwise. 


A metric associated with a norm has the additional properties that for all 
z,y,z€X andkeR 


d(x + z,y + z) = d(x,y), d(kx, ky) = |k|d(z, y), 


13.2. Normed spaces 277 


which are called translation invariance and homogeneity, respectively. These prop- 
erties imply that the open balls B,(x) in a normed space are rescaled, translated 
versions of the unit ball Bı (0). 


Example 13.22. The set of real numbers R with the absolute-value norm |- | is a 
one-dimensional normed vector space. 


Example 13.23. The discrete metric in Example on R, and the metric in 
Example on R? are not derived from a norm. (Why?) 


Example 13.24. The space R? with any of the norms defined for x = (£1, £2) by 


leli = lea] + lal, [lela =at+23, lello = max (|x1|, |21) 


is a normed vector space. The corresponding metrics are the “taxicab” metric in 
Example|13.5| the Euclidean metric in Example|13.6| and the maximum metric in 
Example|13.7| respectively. 


The norms in Example |13.24| are special cases of a fundamental family of &- 
norms on R”. All of the @?-norms reduce to the absolute-value norm if n = 1, but 
they are different if n > 2. 


Definition 13.25. For 1 < p < œ, the -norm ||- ||) : R” — R is defined for 
xz = (£1, T2,- -, £n) E R” by 

lzllp = (lara? + lol? +--+ + |en)”. 
The ¢?-norm is called the Euclidean norm. For p = 00, the @°°-norm ||-||.. : R” > R 
is defined by 


zlo = max (|x|, |v2|,---,|enl)- 
The notation for the @°°-norm is explained by the fact that 
lello = lim lly 
Moreover, consistent with its name, the @?-norm is a norm. 


Theorem 13.26. Let 1 < p < oo. The space R” with the @?-norm is a normed 
vector space. 


Proof. The space R” is an n-dimensional vector space, so we just need to verify 
the properties of the norm. 


The positivity and homogeneity of the -norm follow immediately from its 
definition. We verify the triangle inequality here only for the cases p = 1, oo. 


Let x = (£1, £2, ..., En) and y = (y1, yo,---,Yn) be points in R”. For p = 1, we 
have 
le + yll = [ea + y| + [v2 + yal +--+ + |En + ynl 
< [xi] + lyr] + [22] + [y2] +--+ + lEn] + lyn 
< llela + llylla- 


278 13. Metric, Normed, and Topological Spaces 


For p = oo, we have 


lz + yllo = max (|r, + y1], |£2 + yal, - - -> |En + Ynl) 
< max (|x| oh lyi|, |r| + lyo|, rers |En] + \Yn|) 
< max (|x|, |x2],-.., |@n|) + max (|y1|, yal, ---;|¥nl) 


< ||2Iloo + Il¥lloo- 


The proof of the triangle inequality for 1 < p < oo is more difficult and is given in 
Section [13.7 


We can use Definition |13.25] to define ||x||, for any 0 < p < oo. However, if 
0 < p < 1, then ||- ||, doesn’t satisfy the triangle inequality, so it is not a norm. 
This explains the restriction 1 < p < oo. 


Although the -norms are numerically different for different values of p, they 
are equivalent in the following sense (see Corollary |13.29). 
Definition 13.27. Let X be a vector space. Two norms ||- ||q, || - ||) on X are 
equivalent if there exist strictly positive constants M > m > 0 such that 


mlzlle < æla < Mijzlla forall z€ X. 


Geometrically, two norms are equivalent if and only if an open ball with re- 
spect to either one of the norms contains an open ball with respect to the other. 
Equivalent norms define the same open sets, convergent sequences, and continuous 
functions, so there are no topological differences between them. 


The next theorem shows that every -norm is equivalent to the 4%°-norm. (See 
Figure [2}) 
Theorem 13.28. Suppose that 1 < p < oo. Then, for every x € R”, 
lels < lell <07 ]lelle: 
Proof. Let x = (z1, £2,..., £n) € R”. Then for each 1 < i < n, we have 
Jes] < (lal? + lool? +-+ + 2n)? = [lallp, 
which implies that 
zll = max {ai : 1 < i < n} < llællp: 
On the other hand, since |z;| < ||z||.. for every 1 < i < n, we have 


T: 
lællp < (nliz?) = n/?|[zL00, 


which proves the result 


As an immediate consequence, we get the equivalence of the #-norms. 


Corollary 13.29. The Æ and 41 norms on R” are equivalent for every 1 < p,q < oo. 


Proof. We have n7'/4||2\|q < ||2lloo < |lallp < 2/? lla loo < n*ea. 


With more work, one can prove that that ||z||4 < ||z||p for 1 < p < q < œ, 
meaning that the unit ball with respect to the @4-norm contains the unit ball with 
respect to the £?-norm. 


13.3. Open and closed sets 279 


13.3. Open and closed sets 


There are natural definitions of open and closed sets in a metric space, analogous to 
the definitions in R. Many of the properties of such sets in R carry over immediately 
to general metric spaces. 


Definition 13.30. Let X be a metric space. A set G C X is open if for every 
x € G there exists r > 0 such that B, (x) C G. A subset F C X is closed if 
F° = X \ F is open. 


We can rephrase this definition more geometrically in terms of neighborhoods. 


Definition 13.31. Let X be a metric space. A set U C X is a neighborhood of 
x € X if B.(x) C U for some r > 0. 


Definition}13.30|then states that a subset of a metric space is open if and only if 
every point in the set has a neighborhood that is contained in the set. In particular, 
a set is open if and only if it is a neighborhood of every point in the set. 


Example 13.32. If d is the discrete metric on a set X in Example then every 
subset A C X is open, since for every x € A we have By/2(x) = {a} C A. Every 
set is also closed, since its complement is open. 


Example 13.33. Consider R? with the Euclidean norm (or any other ¢?-norm). 
If f : R > R is a continuous function, then 


E = {(x1,2) € R? : £2 < f(x1)} 


is an open subset of R?. If f is discontinuous, then E needn’t be open. We leave 
the proofs as an exercise. 


Example 13.34. If (X,d) is a metric space and A C X, then B C A is open in 
the metric subspace (A, da) if and only if B = ANG where G is an open subset of 
X. This is consistent with our previous definition of relatively open sets in A C R. 


Open sets with respect to one metric on a set need not be open with respect 
to another metric. For example, every subset of R with the discrete metric is open, 
but this is not true of R with the absolute-value metric. 


Consistent with our terminology, open balls are open and closed balls are closed. 


Proposition 13.35. Let X be a metric space. If x € X and r > 0, then the open 
ball B,(x) is open and the closed ball B,(x) is closed. 


Proof. Suppose that y € B,(x) where r > 0, and let € = r — d(x,y) > 0. The 
triangle inequality implies that B.(y) C B,(x), which proves that B,(x) is open. 
Similarly, if y € B,(«)° and e = d(a, y) —r > 0, then the triangle inequality implies 
that B.(y) C B,(x)°, which proves that B,(x)° is open and B,.(z) is closed. 


The next theorem summarizes three basic properties of open sets. 
Theorem 13.36. Let X be a metric space. 
(1) The empty set @ and the whole set X are open. 


(2) An arbitrary union of open sets is open. 


280 13. Metric, Normed, and Topological Spaces 


(3) A finite intersection of open sets is open. 


Proof. Property (1) follows immediately from Definition }13.30| (The empty set 
satisfies the definition vacuously: since it has no points, every point has a neigh- 
borhood that is contained in the set.) 

To prove (2), let {G; C X : i € I} be an arbitrary collection of open sets. If 

TE U Gi, 
i€l 
then x € G; for some i € I. Since G; is open, there exists r > 0 such that 
B,(x) C Gi, and then 
icI 

Thus, the union |J G; is open. 

The prove (3), let {Gi,Go,...,G,} be a finite collection of open sets. If 


n 
LE N Gi, 
t=1 


then x € G; for every 1 < i < n. Since G; is open, there exists r; > 0 such that 
B, (£) C Gi. Let r = min(r1,r2,..., fn) > 0. Then B,(#) C Br (a) C G; for every 
1 <i< n, which implies that 


Thus, the finite intersection (] G; is open. 


The previous proof fails if we consider the intersection of infinitely many open 
sets {G; : i E€ I} because we may have inf{r; : i € I} = 0 even though r; > 0 for 
every i € I. 


The properties of closed sets follow by taking complements of the corresponding 
properties of open sets and using De Morgan’s laws, exactly as in the proof of 


Proposition 

Theorem 13.37. Let X be a metric space. 
(1) The empty set @ and the whole set X are closed. 
(2) An arbitrary intersection of closed sets is closed. 
(3) A finite union of closed sets is closed. 


The following relationships of points to sets are entirely analogous to the ones 
in Definition for R. 


Definition 13.38. Let X be a metric space and AC X. 


(1) A point x € A is an interior point of A if B,(#) C A for some r > 0. 

(2) A point x € A is an isolated point of A if B (x) O A = {x} for some r > 0, 
meaning that x is the only point of A that belongs to B,.(z). 

(3) A point x € X is a boundary point of A if, for every r > 0, the ball B, (x) 
contains a point in A and a point not in A. 


13.3. Open and closed sets 281 


(4) A point x € X is an accumulation point of A if, for every r > 0, the ball B, (x) 
contains a point y € A such that y # zx. 


A set is open if and only if every point is an interior point and closed if and 
only if every accumulation point belongs to the set. 


We define the interior, boundary, and closure of a set as follows. 
Definition 13.39. Let A be a subset of a metric space. The interior A° of A is the 


set of interior points of A. The boundary OA of A is the set of boundary points. 
The closure of A is A = A U A. 


It follows that x € A if and only if the ball B,(a) contains some point in A for 
every r > 0. The next proposition gives equivalent topological definitions. 


Proposition 13.40. Let X be a metric space and A C X. The interior of A is the 
largest open set contained in A, 


Ae =(J{Gc A4:Gis open in X}, 
the closure of A is the smallest closed set that contains A, 

A=(){F > A: F is closed in X}, 
and the boundary of A is their set-theoretic difference, 

O0A=A\A°. 

Proof. Let A; denote the set of interior points of A, as in Definition |13.38| and 
A2 = U{GCA:Gis open}. Ifv € Aj, then there is an open neighborhood 
G C A of x, so G C A and z € Ag. It follows that A; C Ag. To get the opposite 
inclusion, note that Az is open by Theorem |13.36| Thus, if x € Ag, then Ag C A 


is a neighborhood of x, so x € A; and Az C Aı. Therefore A, = Az, which proves 
the result for the interior. 


Next, Definition [13.38] and the previous result imply that 
(A)* = (A°)° =|] {G c A®: G is open}. 
Using De Morgan’s laws, and writing G° = F, we get that 
A=|(J{@c A° : G is open}° =(\{F> A: F is closed} , 
which proves the result for the closure. 


Finally, if £ € OA, then x € A = AU ÔA, and no neighborhood of x is 
contained in A, so x ¢ A°. It follows that x € A\ A° and 0A C A\ A°. Conversely, 
if x € A\ A®, then every neighborhood of x contains points in A, since z € A, and 
every neighborhood contains points in A‘, since x ¢ A°. It follows that x € ðA 
and A \ A° C OA, which completes the proof. 


It follows from Theorem|13.36} and Theorem[13.37|that the interior A® is open, 
the closure A is closed, and the boundary OA is closed. Furthermore, A is open if 
and only if A = A®°, and A is closed if and only if A = A. 

Let us illustrate these definitions with some examples, whose verification we 
leave as an exercise. 


282 13. Metric, Normed, and Topological Spaces 


Example 13.41. Consider R with the absolute-value metric. If J = (a,b) and 
J = [a,b], then I° = J° = (a,b), I = J = [a,b], and OI = OJ = {a,b}. Note 
that I = I°, meaning that I is open, and J = J, meaning that J is closed. If 
A = {1/n:n € N}, then A° = Ø and A = ðA = AU {0}. Thus, A is neither open 
(A # A°) nor closed (A # A). If Q is the set of rational numbers, then Q° = @ 
and Q = ðQ = R. Thus, Q is neither open nor closed. Since Q = R, we say that Q 
is dense in R. 


Example 13.42. Let A be the unit open ball in R? with the Euclidean metric, 
A= {(x,y) ER? : x? +y? <1}. 

Then A° = A, the closure of A is the closed unit ball 
A= {(x,y) ER? : x? +y <1}, 


and the boundary of A is the unit circle 
ðA = { (x,y) ER? : £? +y =1}, 


Example 13.43. Let A be the unit open ball with the z-axis deleted in R? with 
the Euclidean metric, 


A= {(z,y) ER? : 1? +y? <1, y #0}. 
Then A° = A, the closure of A is the closed unit ball 
A= {(z,y)€R?:27+y <1}, 
and the boundary of A consists of the unit circle and the x-axis, 
0A = { (x,y) ER? : x? +y? =1} U {(2,0) € R? : |z| <1}. 


Example 13.44. Suppose that X is a set containing at least two elements with 
the discrete metric defined in Example [13.3] If x € X, then the unit open ball is 
B,(x) = {x}, and it is equal to its closure B,(x) = {x}. On the other hand, the 
closed unit ball is By(2) = X. Thus, in a general metric space, the closure of an 
open ball of radius r > 0 need not be the closed ball of radius r. 


13.4. Completeness, compactness, and continuity 


A sequence (z,,) in a set X is a function f : N > X, where x, = f(n) is the nth 
term in the sequence. 


Definition 13.45. Let (X,d) be a metric space. A sequence (£n) in X converges 
to x E€ X, written £n > © as n > œ or 

lim £n = z7, 

noo 


if for every e > 0 there exists N € N such that 


n > N implies that d(£n, £) < €. 


That is, zn > x in X if d(zn, x£) > 0 in R. Equivalently, £n > x as n > oo if 
for every neighborhood U of x there exists N € N such that £n € U for all n > N. 


13.4. Completeness, compactness, and continuity 283 


Example 13.46. If d is the discrete metric on a set X, then a sequence (zn) 
converges in (X,d) if and only if it is eventually constant. That is, there exists 
x € X and N EN such that x, = x for all n > N; and, in that case, the sequence 
converges to z. 


Example 13.47. For R with its standard absolute-value metric, Definition |13.45 
is the definition of the convergence of a real sequence. 


As for subsets of R, we can give a sequential characterization of closed sets in 
a metric space. 


Theorem 13.48. A subset F C X of a metric space X is closed if and only if the 
limit of every convergent sequence in F belongs to F. 


Proof. First suppose that F is closed, meaning that F is open. If (£n) be a 
sequence in F and x € F°, then there is a neighborhood U C F° of x which 
contains no terms in the sequence, so (£n) cannot converge to x. Thus, the limit 
of every convergent sequence in F belongs to F. 


Conversely, suppose that F is not closed. Then F° is not open, and there exists 
a point x € F° such that every neighborhood of x contains points in F. Choose 
Tn E F such that 2, € By/,(x). Then (x,) is a sequence in F whose limit z does 
not belong to F, which proves the result. 


We define the completeness of metric spaces in terms of Cauchy sequences. 


Definition 13.49. Let (X,d) be a metric space. A sequence (zn) in X is a Cauchy 
sequence for every € > 0 there exists N € N such that 


m,n > N implies that d(£m, £n) < €. 
Every convergent sequence is Cauchy: if x, — x then given e > 0 there exists 
N such that d(zn, 2) < €/2 for all n > N, and then for all m,n > N we have 
d(£m, £n) < d(£m, £) + d(£n, £) < €. 
Complete spaces are ones in which the converse is also true. 
Definition 13.50. A metric space is complete if every Cauchy sequence converges. 


Example 13.51. If d is the discrete metric on a set X, then (X, d) is a complete 
metric space since every Cauchy sequence is eventually constant. 


Example 13.52. The space (R, |- |) is complete, but the metric subspace (Q, | - |) 
is not complete. 


In a complete space, we have the following simple criterion for the completeness 
of a subspace. 


Proposition 13.53. A subspace (A, d4) of a complete metric space (X, d) is com- 
plete if and only if A is closed in X. 


Proof. If A is a closed subspace of a complete space X and (xn) is a Cauchy 
sequence in A, then (xn) is a Cauchy sequence in X, so it converges to x € X. 
Since A is closed, x € A, which shows that A is complete. 


284 13. Metric, Normed, and Topological Spaces 


Conversely, if A is not closed, then by Proposition }13.48]there is a convergent 
sequence in A whose limit does not belong to A. Since it converges, the sequence 
is Cauchy, but it doesn’t have a limit in A, so A is not complete. 


The most important complete metric spaces in analysis are the complete normed 
spaces, or Banach spaces. 


Definition 13.54. A Banach space is a complete normed vector space. 
For example, R with the absolute-value norm is a one-dimensional Banach 
space. Furthermore, it follows from the completeness of R that every finite-dimensional 


normed vector space over R is complete. We prove this for the -norms given in 
Definition [13.25 


Theorem 13.55. Let 1 < p < oo. The vector space R” with the @?-norm is a 
Banach space. 


Proof. Suppose that (x)? is a sequence of points 
Le = (L1,b,L2,k,--+;En,k) 

in R” that is Cauchy with respect to the ¢?-norm. From Theorem [13.28] 
[zij — tikl < lej — tell, 


so each coordinate sequence (a; ,)72, is Cauchy in R. The completeness of R 


implies that xi —> x; as k > oo for some x; € R. Let x = (£1, £2,..., 8n). Then, 
from Theorem[|13.28| again, 
zk — z||, < Cmax {|x;,, — z| : i = 1,2,... n}, 


where C = n!/? if 1 < p < œ or C = 1 if p = oo. Given € > 0, choose N; € N such 
that |xrik — xil < €/C for all k > N;, and let N = max{Nj, No,...,Nn}. Then 
k > N implies that ||a, — ||, < €, which proves that £g — x with respect to the 
é?-norm. Thus, (R”, ||- ||) is complete. 


The Bolzano-Weierstrass property provides a sequential definition of compact- 
ness in a general metric space. 


Definition 13.56. A subset K C X of a metric space X is sequentially compact, 
or compact for short, if every sequence in K has a convergent subsequence whose 
limit belongs to K. 


Explicitly, this condition means that if (x,) is a sequence of points 2, € K 
then there is a subsequence (%p,) such that £n, > x as k + oo where v € K. 
Compactness is an intrinsic property of a subset: K C X is compact if and only if 
the metric subspace (K, dx) is compact. 


Although this definition is similar to the one for compact sets in R, there is 
a significant difference between compact sets in a general metric space and in R. 
Every compact subset of a metric space is closed and bounded, as in R, but it is 
not always true that a closed, bounded set is compact. 


First, as the following example illustrates, a set must be complete, not just 
closed, to be compact. (A closed subset of R is complete because R is complete.) 


13.4. Completeness, compactness, and continuity 285 


Example 13.57. Consider the metric space Q with the absolute value norm. The 
set [0,2] Q is a closed, bounded subspace, but it is not compact since a sequence 
of rational numbers that converges in R to an irrational number such as /2 has no 
convergent subsequence in Q. 


Second, completeness and boundedness is not enough, in general, to imply 
compactness. 
Example 13.58. Consider N, or any other infinite set, with the discrete metric, 
0 ifm=n, 
d(m, n) = : 
1 ifmsn. 


Then N is complete and bounded with respect to this metric. However, it is not 
compact since £n = n is a sequence with no convergent subsequence, as is clear 


from Example}13.46 


The correct generalization to an arbitrary metric space of the characterization 
of compact sets in R as closed and bounded replaces “closed” with “complete” and 
“bounded” with “totally bounded,” which is defined as follows. 


Definition 13.59. Let (X, d) be a metric space. A subset A C X is totally bounded 
if for every € > 0 there exists a finite set {£1, £2, ..., £n} of points in X such that 


Ac |] Bez). 
i=1 


The proof of the following result is then completely analogous to the proof of 
the Bolzano-Weierstrass theorem in Theorem for R. 


Theorem 13.60. A subset K C X of a metric space X is sequentially compact if 
and only if it is is complete and totally bounded. 


The definition of the continuity of functions between metric spaces parallels the 
definitions for real functions. 


Definition 13.61. Let (X,dx) and (Y,dy) be metric spaces. A function f : X > 
Y is continuous at c € X if for every € > 0 there exists 6 > 0 such that 


dx (a,c) < 6 implies that dy (f(x), f(c)) < €. 
The function is continuous on X if it is continuous at every point of X. 


Example 13.62. A function f : R? — R, where R? is equipped with the Euclidean 
norm ||- || and R with the absolute value norm |- |, is continuous at c € R? if 


|z — cl] < 6 implies that || f(a) — f(c)|| < € 
Explicitly, if x = (41,22), € = (c1, c2) and 
f(x) = (fi(z1, £2), fo(x1, 22), 


this condition reads: 


y (z1 — c)? + (£2 = C2)? <ð 
implies that 
|f (£1,£2) — fı (c1, c2)| < €. 


286 13. Metric, Normed, and Topological Spaces 


Example 13.63. A function f : R > R?, where R? is equipped with the Euclidean 
norm ||- || and R with the absolute value norm |- |, is continuous at c € R? if 
|x — c| < ô implies that || f(2) — f(c)|| < € 
Explicitly, if 
f(x) = (f(x), fale), 


where f1, fo: R —> R, this condition reads: |x — c| < 6 implies that 


VIAE) = AOP + fle) - AP < e. 


The previous examples generalize in a natural way to define the continuity 
of an m-component vector-valued function of n variables, f : R” — R™. The 
definition looks complicated if it is written out explicitly, but it is much clearer if 
it is expressed in terms or metrics or norms. 


Example 13.64. Define F : C([0,1]) > R by 


F(f) = f(0), 
where C (|0, 1]) is the space of continuous functions f : [0, 1] + R equipped with the 
sup-norm described in Example [13.9] and R has the absolute value norm. That is, 
F evaluates a function f(x) at x = 0. Thus F is a function acting on functions, and 
its values are scalars; such a function, which maps functions to scalars, is called a 
functional. Then F is continuous, since || f — glo. < € implies that | f(0) —g(0)| < €. 
(That is, we take ô = €). 


We also have a sequential characterization of continuity in a metric space. 


Theorem 13.65. Let X and Y be metric spaces. A function f : X — Y is 
continuous at c € X if and only if f(a,) > f(c) as n — oo for every sequence (£n) 
in X such that £n > c as n > oo, 


We define uniform continuity similarly to the real case. 


Definition 13.66. Let (X, dx) and (Y, dy) be metric spaces. A function f : X > 
Y is uniformly continuous on X if for every € > 0 there exists 6 > 0 such that 


dx (x,y) < ô implies that dy (f(x), f(y)) < €. 


The proofs of the following theorems are identically to the proofs we gave for 
functions f: A C R —> R. First, a function on a metric space is continuous if and 
only if the inverse images of open sets are open. 


Theorem 13.67. A function f : X — Y between metric spaces X and Y is 
continuous on X if and only if f~1(V) is open in X for every open set V in Y. 


Second, the continuous image of a compact set is compact. 


Theorem 13.68. Let f : K — Y be a continuous function from a compact metric 
space K to a metric space Y. Then f(K) is a compact subspace of Y. 


Third, a continuous function on a compact set is uniformly continuous. 


Theorem 13.69. If f : K — Y is a continuous function on a compact set K, then 
f is uniformly continuous. 


13.5. Topological spaces 287 


13.5. Topological spaces 


A collection of subsets of a set X with the properties of the open sets in a metric 
space given in Theorem |13.36] is called a topology on X, and a set with such a 
collection of open sets is called a topological space. 


Definition 13.70. Let X be a set. A collection T C P(X) of subsets of X is a 
topology on X if it satisfies the following conditions. 


(1) The empty set @ and the whole set X belong to T. 
(2) The union of an arbitrary collection of sets in 7 belongs to T. 


(3) The intersection of a finite collection of sets in 7 belongs to T. 


A set G C X is open with respect to 7 if G € T, and a set F C X is closed with 
respect to T if F° € T. A topological space (X,7) is a set X together with a 
topology T on X. 


We can put different topologies on a set with two or more elements. If the 
topology on X is clear from the context, then we simply refer to X as a topological 
space and we don’t specify the topology when we refer to open or closed sets. 


Every metric space with the open sets in Definition[13.30]is a topological space; 
the resulting collection of open sets is called the metric topology of the metric 
space. There are, however, topological spaces whose topology is not derived from 
any metric on the space. 


Example 13.71. Let X be any set. Then T = P(X) is a topology on X, called the 
discrete topology. In this topology, every set is both open and closed. This topology 
is the metric topology associated with the discrete metric on X in Example 


Example 13.72. Let X be any set. Then T = {2, X} is a topology on X, called 
the trivial topology. The empty set and the whole set are both open and closed, 
and no other subsets of X are either open or closed. 


If X has at least two elements, then this topology is different from the discrete 
topology in the previous example, and it is not derived from a metric. To see this, 
suppose that x,y € X anda Æ y. Ifd: X x X > R is a metric on X, then 
d(x,y) = r > 0 and B,(a) is a nonempty open set in the metric topology that 
doesn’t contain y, so B,(x) ¢ T. 


The previous example illustrates a separation property of metric topologies 
that need not be satisfied by non-metric topologies. 


Definition 13.73. A topological space (X,7) is Hausdorff if for every x,y € X 
with x Æ y there exist open sets U,V € T such that x € U, y € V and UNA V = Ø. 


That is, a topological space is Hausdorff if distinct points have disjoint neigh- 
borhoods. In that case, we also say that the topology is Hausdorff. Nearly all 
topological spaces that arise in analysis are Hausdorff, including, in particular, 
metric spaces. 


Proposition 13.74. Every metric topology is Hausdorff. 


288 13. Metric, Normed, and Topological Spaces 


Proof. Let (X,d) be a metric space. If x,y E€ X and z Æ y, then d(x,y) =r > 0, 
and B,/2(x), B,/2(y) are disjoint open neighborhoods of æ, y. 


Compact sets are defined topologically as sets with the Heine-Borel property. 


Definition 13.75. Let X be a topological space. A set K C X is compact if every 
open cover of K has a finite subcover. That is, if {G; : i € I} is a collection of open 
sets such that 
KC U Gi, 
icI 
then there is a finite subcollection {Gi,,Gi,,...,Gi,,} such that 


mee, 


k=1 


The Heine-Borel and Bolzano-Weierstrass properties are equivalent in every 
metric space. 


Theorem 13.76. A metric space is compact if and only if it sequentially compact. 


We won't prove this result here, but we remark that it does not hold for general 
topological spaces. 


Finally, we give the topological definitions of convergence, continuity, and con- 
nectedness which are essentially the same as the corresponding statements for R. 
We also show that continuous maps preserve compactness and connectedness. 


The definition of the convergence of a sequence is identical to the statement in 
Proposition [5.9] for R. 


Definition 13.77. Let X be a topological space. A sequence (£n) in X converges 
to x € X if for every neighborhood U of x there exists N € N such that £n € U for 
every n > N. 


The following definition of continuity in a topological space corresponds to 
Definition [7.2] for R (with the relative absolute-value topology on the domain A of 


f) and Theorem [7.31] a 


Definition 13.78. Let f : X — Y be a function between topological spaces X, Y. 
Then f is continuous at x € X if for every neighborhood V C Y of f(x), there exists 
a neighborhood U C X of x such that f(U) c V. The function f is continuous on 
X if f~'(V) is open in X for every open set V C Y. 


These definitions are equivalent to the corresponding “e-6” definitions in a 
metric space, but they make sense in a general topological space because they refer 
only to neighborhoods and open sets. We illustrate them with two simple examples. 


Example 13.79. If X is a set with the discrete topology in Example[13.7]] then a 
sequence converges to x € X if an only if its terms are eventually equal to x, since 
{x} is a neighborhood of x. Every function f : X — Y is continuous with respect 
to the discrete topology on X, since every subset of X is open. On the other hand, 
if Y has the discrete topology, then f : X — Y is continuous if and only if f~!({y}) 
is open in X for every y € Y. 


13.6. * Function spaces 289 


Example 13.80. Let X be a set with the trivial topology in Example[I3.72| Then 
every sequence converges to every point x € X, since the only neighborhood of x is 
X itself. As this example illustrates, non-Hausdorff topologies have the unpleasant 
feature that limits need not be unique, which is one reason why they rarely arise in 
analysis. If Y has the trivial topology, then every function X — Y is continuous, 
since f~'(@) = Ø and f~1(Y) = X are open in X. On the other hand, if X has the 
trivial topology and Y is Hausdorff, then the only continuous functions f : X > Y 
are the constant functions. 


Our last definition of a connected topological space corresponds to Defini- 
tion for connected sets of real numbers (with the relative topology). 


Definition 13.81. A topological space X is disconnected if there exist nonempty, 
disjoint open sets U, V such that X =U UV. A topological space is connected if 
it is not disconnected. 


The following proof that continuous functions map compact sets to compact 
sets and connected sets is the same as the proofs given in Theorem [7.35] and The- 
orem [7.32] for sets of real numbers. Note that a continuous function maps compact 
or connected sets in the opposite direction to open or closed sets, whose inverse 
image is open or closed. 


Theorem 13.82. Suppose that f : X — Y is a continuous map between topologi- 
cal spaces X and Y. Then f(X) is compact if X is compact, and f(X) is connected 
if X is connected. 


Proof. For the first part, suppose that X is compact. If {V; :i € I} is an open 
cover of f(X), then since f is continuous {f~!(V;) : i € I} is an open cover of X, 
and since X is compact there is a finite subcover 


IF Was VG tend (YG 
It follows that 
1 Ves Voge vay Vin} 

is a finite subcover of the original open cover of f(X), which proves that f(X) is 
compact. 

For the second part, suppose that f(X) is disconnected. Then there exist 
nonempty, disjoint open sets U, V in Y such that UUV D f(X). Since f is 
continuous, f-'(U), f~1(V) are open, nonempty, disjoint sets such that 


X = f (U)U f (V), 


so X is disconnected. It follows that f(X) is connected if X is connected. 


13.6. * Function spaces 


There are many function spaces, and their study is a central topic in analysis. 
We discuss only one important example here: the space of continuous functions 
on a compact set equipped with the sup norm. We repeat its definition from 


Example 


290 13. Metric, Normed, and Topological Spaces 


Definition 13.83. Let K C R be a compact set. The space C(K) consists of the 
continuous functions f : K — R. Addition and scalar multiplication of functions is 
defined pointwise in the usual way: if f,g € C(K) and k € R, then 


(f+g)(a)=flx)+g(a),  (kf)(x) =k (F(x). 
The sup-norm of a function f € C(K) is defined by 


II flloo = sup |f(x)]. 
Zeek 


Since a continuous function on a compact set attains its maximum and mini- 
mum value, for f € C(K) we can also write 


flls = max |f(e)] 


Thus, the sup-norm on C(K) is analogous to the £*°-norm on R”. In fact, if 
K = {1,2,...,n} is a finite set with the discrete topology, then it is identical to 
the @°°-norm. 


Our previous results on continuous functions on a compact set can be formu- 
lated concisely in terms of this space. The following characterization of uniform 
convergence in terms of the sup-norm is easily seen to be equivalent to Definition]9.8} 


Definition 13.84. A sequence (fn) of functions fn : K — R converges uniformly 
on K to a function f : K >R if 


jim, lfa — flloo = 0. 


Similarly, we can rephrase Definition for a uniformly Cauchy sequence in 
terms of the sup-norm. 


Definition 13.85. A sequence (fn) of functions fn : K —> R is uniformly Cauchy 
on K if for every e > 0 there exists N € N such that 


m,n > N implies that || fm — fnlloo < €- 


Thus, the uniform convergence of a sequence of functions is defined in exactly 
the same way as the convergence of a sequence of real numbers with the absolute 
value |- | replaced by the sup-norm ||- ||- Moreover, like R, the space C(K) is 
complete. 


Theorem 13.86. The space C(K) with the sup-norm ||- ||, is a Banach space. 


Proof. From Theorem [7.15] the sum of continuous functions and the scalar mul- 
tiple of a continuous function are continuous, so C(K) is closed under addition 
and scalar multiplication. The algebraic vector-space properties for C(K) follow 
immediately from those of R. 

From Theorem a continuous function on a compact set is bounded, so 
||-||oo is well-defined on C (K). The sup-norm is clearly non-negative, and || f||.. = 0 
implies that f(x) = 0 for every x € K, meaning that f = 0 is the zero function. 


13.6. * Function spaces 291 


We also have for all f,g € C(K) and k € R that 
Ik flloo = sup |kf(x)| = |k] sup [f(x)| = Ikl IIflloo; 
cek cek 
If + gll = sup |f(x) + g(z)| 
cek 


sup {|f (x)| + |g(@)I} 
cek 


IA Il II 


IA 


sup |f (x)| + sup |g(2)| 
cek xEK 


II flloo + Igloo, 


which verifies the properties of a norm. 


IA 


Finally, Theorem implies that a uniformly Cauchy sequence converges 
uniformly so CK) is complete. 


For comparison with the sup-norm, we consider a different norm on C([a, }]) 
called the one-norm, which is analogous to the t-norm on R”. 


Definition 13.87. If f : [a,b] ~ R is a Riemann integrable function, then the 
one-norm of f is 


b 
ll: =| f(a)| de. 


Theorem 13.88. The space C([a,b]) of continuous functions f : [a,b] —> R with 
the one-norm ||- ||; is a normed space. 


Proof. As shown in Theorem|13.86| C([a,b]) is a vector space. Every continuous 
function is Riemann integrable on a compact interval, so ||- ||; : C({a,b]) > R is 
well-defined, and we just have to verify that it satisfies the properties of a norm. 


Since |f| > 0, we have ||f||1 = i |f| > 0. Furthermore, since f is continuous, 


Proposition |11.42} shows that ||f||ı = 0 implies that f = 0, which verifies the 
positivity. If k € R, then 


b b 
lef, sd kf] = al f fl =e If, 


which verifies the homogeneity. Finally, the triangle inequality is satisfied since 


b b b b 
I+ =f eas f ifl+lal = f i+ f lal = llflla + lla. 


Although C([a,b]) equipped with the one-norm ||- ||; is a normed space, it is 
not complete, and therefore it is not a Banach space. The following example gives 
a non-convergent Cauchy sequence in this space. 


Example 13.89. Define the continuous functions fn : [0,1] + R by 
0 if0<a <1/2, 
falx) = 4 n(x — 1/2) if 1/2<a2<1/2+1/n, 
1 if1/2+1/n<a<1. 


292 13. Metric, Normed, and Topological Spaces 


If n >m, we have 
1/2+1/m 1 
lia- fah = f Fa fal = —, 
1/2 m 


since | fn — fn| < 1. Thus, || fn — fmlli < € for all m,n > 1/e, so (fn) is a Cauchy 
sequence with respect to the one-norm. 


We claim that if |f — fnl|ı —> 0 as n + co where f € C([0,1]), then f would 
have to be 


1 if1/2<a<l, 
which is discontinuous at 1/2, so (f,) does not have a limit in (C([0, 1]), || - lla). 


To prove the claim, note that if || f — f,||1 — 0, then p- 


[inf i-is f -ao 


and Proposition |11.42|implies that f(x) = 0 for 0 < z < 1/2. Similarly, for every 
0 < e < 1/2, we get that ee |f —1| =0, so f(a) =1 for 1/2<x <1. 


The sequence (fn) is not uniformly Cauchy since || fn — fml|lo > 1 as n > 
co for every m € N, so this example does not contradict the completeness of 


(C([0, 1), II + Ils). 


The @-norm and the ¢!-norm on the finite-dimensional space R” are equiva- 
lent, but the sup-norm and the one-norm on C([a,b]) are not. In one direction, we 
have 


f(a) = f if 0< x< 1/2, 


|f| = 0 since 


b 
T |f] < (— a) -sup |f], 
a [a,b] 


so ||fllı < (b — all fllə, and || f — fallo — 0 implies that || f — fal|ı > 0. As the 
following example shows, the converse is not true. There is no constant M such 
that || fllo < M| fl|ı for all f € C([a,b]), and ||f — fn||ı + 0 does not imply that 
IF — fallo + 0. 


Example 13.90. For n € N, define the continuous function fn : [0,1] > R by 


l-nx if0<a2<1/n, 
fn(x) = i / 
0 ifl/n<a<l. 


Then || fn||oo = 1 for every n € N, but 


1/n 1 1/n 1 
lal = f (1 — na) dx = je- ne’ = — 
0 


0 
so || frll1 — 0 as n > co. 


Thus, unlike the finite-dimensional vector space R”, an infinite-dimensional 
vector space such as C([a,b]) has many inequivalent norms and many inequivalent 
notions of convergence. 


The incompleteness of C([a,6]) with respect to the one-norm suggests that 
we use the larger space R([a,b]) of Riemann integrable functions on [a,b], which 


13.7. * The Minkowski inequality 293 


includes some discontinuous functions. A slight complication arises from the fact 
that if f is Riemann integrable and f? |f| = 0, then it does not follows that f = 0, 
so || f||1 = 0 does not imply that f = 0. Thus, ||- ||ı is not, strictly speaking, a norm 
on R({a, b]). We can, however, get a normed space of equivalence classes of Riemann 
integrable functions, by defining f,g € R([a,b]) to be equivalent if f? |f —g| = 0. 
For instance, the function in Example[11.14]is equivalent to the zero-function. 


A much more fundamental defect of the space of (equivalence classes of) Rie- 
mann integrable functions with the one-norm is that it is still not complete. To get 
a space that is complete with respect to the one-norm, we have to use the space 
L! ([a,b]) of (equivalence classes of) Lebesgue integrable functions on [a,b]. This 
is another reason for the superiority of the Lebesgue integral over the Riemann 
integral: it leads function spaces that are complete with respect to integral norms. 


The inclusion of the smaller incomplete space C({a, b]) of continuous functions 
in the larger complete space L1({a, b]) of Lebesgue integrable functions is analogous 
to the inclusion of the incomplete space Q of rational numbers in the complete 
space R of real numbers. 


13.7. * The Minkowski inequality 


Inequalities are essential to analysis. Their proofs, however, may require consider- 
able ingenuity, and there are often many different ways to prove the same inequality. 
In this section, we complete the proof that the -spaces are normed spaces by prov- 
ing the triangle inequality given in Definition [13.25] This inequality is called the 
Minkowski inequality, and it’s one of the most important inequalities in mathemat- 
ics. 

The simplest case is for the Euclidean norm with p = 2. We begin by proving 
the following fundamental Cauchy-Schwartz inequality. 


Theorem 13.91 (Cauchy-Schwartz inequality). If (x1, £2,..., £n), (y1,Y2,--+;Yn) 


are points in R”, then 
m T 1/2 
($2) (Sy) - 
i=1 i=1 


Proof. Since |S>a;y;| < >> |x;||y;|, it is sufficient to prove the inequality for 
Zi, Yi = 0. Furthermore, the inequality is obvious if x = 0 or y = 0, so we as- 
sume that at least one x; and one y; is nonzero. 


n 
> Tiyi 
i=1 


For every a, 8 € R, we have 
0< 5 (azi — By) 
i=1 


Expanding the square on the right-hand side and rearranging the terms, we get 


that 
n n n 
2a X` tiyi <a? Soa} + B? baer 
i=l i=l i=l 


294 13. Metric, Normed, and Topological Spaces 


We choose a, 3 > 0 to “balance” the terms on the right-hand side, 


i 1/2 Py 1/2 
E)" E 
{=I jal 


Then division of the resulting inequality by 2a8 proves the theorem. 


The Minkowski inequality for p = 2 is an immediate consequence of the Cauchy- 
Schwartz inequality. 


Corollary 13.92 (Minkowski inequality). If (£1, £2,..., £n) and (y1, y2,---;Yn) 


are points in R”, then 
1/2 z 1/2 s 1/2 
(Sa) (2) 
i=1 i=1 


n 
> (zi + yi)? 
i=1 
Proof. Expanding the square in the following equation and using the Cauchy- 
Schwartz inequality, we get 


n 


So (ai +y) = ya +250 ziyi +y 
t=1 A Fi ws : P i 
< Yost +a(Ya) ($+) yr 


i=l 


Taking the square root of this inequality, we obtain the result. 


To prove the Minkowski inequality for general 1 < p < ov, we first define the 
Holder conjugate p’ of p and prove Young’s inequality. 


Definition 13.93. If 1 < p < ov, then the Holder conjugate 1 < p' < œ of p is 
the number such that 
+ 

P P 

If p = 1, then p’ = œ; and if p = œ then p' = 1. 
The Holder conjugate of 1 < p < œ is given explicitly by 
p' = p 
pL 

Note that if 1 < p < 2, then 2 < p' < œ; and if 2 < p < œ, then 1 < p' < 2. The 
number 2 is its own Hölder conjugate. Furthermore, if p’ is the Hölder conjugate 
of p, then p is the Hélder conjugate of p’. 


Theorem 13.94 (Young’s inequality). Suppose that 1 < p < oo and 1 < p' < oo 
is its Holder conjugate. If a,b > 0 are nonnegative real numbers, then 

aP o 

Pp P 


13.7. * The Minkowski inequality 295 


Moreover, there is equality if and only if a? = bP, 


Proof. There are several ways to prove this inequality. We give a proof based on 
calculus. 


The result is trivial if a = 0 or b = 0, so suppose that a,b > 0. We write 
aP oP , {1aP 1 a 
—+—-—ab=0? -+ - f 

p l ( b ao ) 
The definition of p’ implies that p’/p = p’ — 1, so that 


= = (on) = (ao) 


Therefore, we have 


1 


aP? bP , t 1 a 
2l abb ae EET = 
pig eR f(t),  f(t)= age eer 
The derivative of f is 
fii) =P 1-1. 


Thus, for p > 1, we have f’(t) < 0 if 0 < t< 1, and Theorem implies that 
f(t) is strictly decreasing; moreover, f’(t) > 0 if 1 < t< o, so f(t) is strictly 
increasing. It follows that f has a strict global minimum on (0,00) at t = 1. Since 
1 1 
p P 
we conclude that f(t) > 0 for all 0 < t < oo, with equality if and only if t = 1. 
Furthermore, t = 1 if and only if a = b”! or a? = b”. It follows that 
Po 
p P 


for all a,b > 0, with equality if and only a? = b”’, which proves the result. 


For p = 2, Young’s inequality reduces to the more easily proved inequality in 
Proposition 

Before continuing, we give a scaling argument which explains the appearance 
of the Hölder conjugate in Young’s inequality. Suppose we look for an inequality 
of the form 

ab < Ma? + Nat for all a,b > 0 

for some exponents p, q and some constants M, N. Any inequality that holds for all 
positive real numbers must remain true under rescalings. Rescaling at> Aa, b > ub 
in the inequality (where A, u > 0) and dividing by Ay, we find that it becomes 


A21 q—1 
ab < &— Ma + E— noe. 
H À 


We take u = APT! to make the first scaling factor equal to one, and then the 
inequality becomes 
ab < Ma? +A NDI, r=(p—1)(q-1)-1. 


If the exponent r of A is non-zero, then we can violate the inequality by taking 
à sufficiently small (if r > 0) or sufficiently large (if r < 0), since it is clearly 


296 13. Metric, Normed, and Topological Spaces 


impossible to bound ab by a? for all b € R. Thus, the inequality can only hold if 
r = 0, which implies that q = p’. 

This argument does not, of course, prove the inequality, but it shows that the 
only possible exponents for which an inequality of this form can hold must satisfy 
q=p'. Theorem [13.94] proves that such an inequality does in fact hold in that case 
provided 1 < p < oo. 


Next, we use Young’s inequality to deduce Hdlder’s inequality, which is a gen- 
eralization of the Cauchy-Schwartz inequality for p Æ 2. 


Theorem 13.95 (Hölder’s inequality). Suppose that 1 < p < œ and 1 < p' < o0 
is its Hölder conjugate. If (x1, £2, ..., £n) and (y1, y2,---,; Yn) are points in R”, then 


n n 1/p n 1/p’ 
a < (Sia) (Sur) 
i=l i=l i=l 


Proof. We assume without loss of generality that x;, y; are nonnegative and x,y # 
0. Let a, 8 > 0. Then applying Young’s inequality in Theorem}13.94| with a = azi, 
b = By; and summing over i, we get 


n QP n BP n ; 
ob) tii S— Dae + Dw 
i=1 P izi P i 


n 1/p Fi 1/p' 
Ex)". «(Ee 
i=1 j=1 


to “balance” the terms on the right-hand side, dividing by af, and using the fact 
that 1/p + 1/p' = 1, we get Hölder’s inequality. 


Then, choosing 


Minkowski’s inequality follows from Holder’s inequality. 


Theorem 13.96 (Minkowski’s inequality). Suppose that 1 < p < oo and 1 < p' < 


oo is its Hélder conjugate. If (£1, 22,...,2%,) and (y1, yo,---,Yn) are points in R”, 
then 
n 1/p n 1/p n 1/p 
(>: py nr) < (>: ep) + (>: uP) 
i=1 i=1 i=1 


Proof. We assume without loss of generality that x;, y; are nonnegative and x,y # 
0. We split the sum on the left-hand side as follows: 


n n 


Do leet ysl? = So lars + yil ei t ysl?” 


i=1 i=1 
n n 
< So fecal lei t ysl? + SS oal lei ye? 
i=1 i=1 
By Holder’s inequality, we have 
/p" 


a n 1/p n 1 
Y lel bes P< (>: ar) 2 [oi + wl ) 
= i=1 i=1 


13.7. * The Minkowski inequality 297 


and using the fact that p’ = p/(p — 1), we get 


n n 1/p n 1-1/p 
5 zil [zi +y? < (>: er) (>: [xs + ysl ‘ . 


i=l i=l 


Similarly, 


n n 1/p 1—1/p 
So lyil lei + ysl? < (>: ur) S zi + yl ‘ ; 
i=l t=1 i= 


Combining these inequalities, we obtain 


a i 1/p n 1/p n 1—1/p 
See (>: er) P (>: wr) (>: +n | 
gmi i=l i=l i=l 


Fianlly, dividing this inequality by (X` |æ; + y;|?)'~1/”, we get the result. 


Bibliography 


won a 


Co NO ao WS 


10 
11 


12 
13 
14 
15 
16 


S. Abbott, Understanding Analysis, Springer-Verlag, New York, 2001. 
T. Apostol, Mathematical Analysis, Addison-Wesley, 1974. 


R. Dedekind, Was sind und was sollen die Zahlen?, 1888. Tranbslation: What are 
Numbers and What Should They Be?, H. Pogorzelski, W. Ryan, W. Snyder, Research 
Institute for Mathematics, (1995). 


P. Duren, Invitation to Classical Analysis, AMS, 2012. 

W. Dunham, The Calculus Gallery, Princeton University Press, 2005. 

T, W. Korner, A Companion to Analysis, AMS, 2004. 

J. E. Marsden and M. J. Hoffman, Elementary Classical Analysis, Macmillan, 1993. 
F. A. Medvedev, Scenes from the History of Real Functions, Birkhauser, Basel, 1991. 
V. H. Moll, Numbers and Functions, AMS, Providence, 2012. 

Y. Moschovakis, Notes on Set Theory, 2nd ed., Springer, 2006. 


B. Riemann, Collected Works, Translated by R. Baker, C. Christenson, and H. Orde, 
Kendrick Press, 2004. 


K. A. Ross, Elementary Analysis, Springer, 2010. 

J. M. Steele, The Cauchy-Schwarz Master Class, Cambridge University Press, 2004. 
R. Strichartz, The Way of Analysis, 2000. 

W. Rudin, Principles of Mathematical Analysis, McGraw-Hill, 1976. 


E. T Whittaker and G. N. Watson, A Course of Modern Analysis, Cambridge Uni- 
versity Press, 1927. 


299 


